The following Data Analysis was performed by Alan Rial for Nuclear Transparency Project, and was finalized on 10/24/2022. The main objective of this exercise is to analyze the quality of open-access radionuclide release datasets, by identifying patterns, trends, unusual values, and missing data to inform regulatory legal submissions related to nuclear transparency and data disclosure for the Canadian Nuclear Safety Commission (CNSC).
4 Categories of facilities to analyze:
Open Data Source: https://open.canada.ca/data/en/dataset/6ed50cd9-0d8c-471b-a5f6-26088298870e
About NTP:
The Nuclear Transparency Project (NTP) is a Canadian-registered not-for-profit organization dedicated to supporting open, informed, and equitable public discourse on nuclear technologies. NTP advocates for robust public access to data and other types of information and helps to produce accessible analysis of publicly available information, all with a view to supporting greater transparency in the Canadian nuclear sector. NTP is comprised of a multi-disciplinary group of experts working to examine the economic, ecological, and social facets and impacts of the Canadian nuclear sector. The organization produces public reports, academic articles, and other publicly accessible resources. It also regularly intervenes in nuclear regulatory decision-making processes. The organization seeks to support youth and early career scholars, especially those from underrepresented communities in their respective disciplines. NTP also recognizes a responsibility to model the transparency and accountability practices for which it advocates. We are committed to interdisciplinary, cross-sectoral, and equitable collaborations and dialogue between regulators, industry, civil society, members of host and potential host communities, as well as academics and professionals from science, technology, engineering and math (STEM) fields, the social sciences, and humanities.
For more information, please refer to https://nucleartransparency.ca/
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
pd.options.mode.chained_assignment = None #disabling the "SettingWithCopyWarning".
df_npp = pd.read_csv("./Datasets/Nuclear Power Plants.csv")
df_npp
| Year | Année | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Région économique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Français) | Units | Unités | < | Stack Emissions | Émissions de cheminées | <.1 | Direct Discharge | Évacuations directes | Footnotes | Notes de bas de page | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Tritium (HTO) | Tritium (Eau tritiée) | Bq | NaN | 4.43E+13 | NaN | 1.56E+14 | NaN |
| 1 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Carbon-14 | Carbone-14 | Bq | NaN | 6.17E+09 | NaN | 6.07E+07 | NaN |
| 2 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Total noble gases | Total des gaz nobles | Bq-MeV | NaN | NRM | NRS | NaN | NRM | NRS | NaN |
| 3 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Iodine-131 | Iode-131 | Bq | NaN | NRM | NRS | NaN | NRM | NRS | NaN |
| 4 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Particulate (gross beta/gamma) | Particules (bêta brutes/gamma brutes) | Bq | NaN | 5.11E+05 | NaN | 7.11E+07 | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 494 | 2011 | 3161 | Ontario Power Generation Inc. | Pickering Nuclear - B | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 | Total noble gases | Total des gaz nobles | Bq-MeV | NaN | 8.40E+13 | NaN | NRM | NRS | NaN |
| 495 | 2011 | 3161 | Ontario Power Generation Inc. | Pickering Nuclear - B | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 | Iodine-131 | Iode-131 | Bq | NaN | 8.80E+06 | NaN | NRM | NRS | NaN |
| 496 | 2011 | 3161 | Ontario Power Generation Inc. | Pickering Nuclear - B | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 | Particulate (gross beta/gamma) | Particules (bêta brutes/gamma brutes) | Bq | NaN | 3.60E+06 | NaN | 1.40E+10 | NaN |
| 497 | 2011 | 3161 | Ontario Power Generation Inc. | Pickering Nuclear - B | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 | Particulate gross alpha | Particules alpha brutes | Bq | NaN | NRM | NRS | NaN | 4.80E+07 | NaN |
| 498 | 2011 | 3163 | Ontario Power Generation Inc. | Pickering Nuclear - A & B | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | NaN | 0.0009 | NaN | NRM | NRS | Estimated public dose is calculated incorporat... |
499 rows × 19 columns
(1) Why does the Data start in 2011? Can we get older data?
# I'm creating a copy because I will need it later.
df_npp_0 = df_npp.copy()
df_npp["Facility Name | Nom de l'installation"].unique()
array(['Gentilly-2', 'Point Lepreau Generating Station',
'Bruce Power - A', 'Bruce Power - B', 'Bruce Power Site',
'Darlington Nuclear', 'Pickering Nuclear - A & B',
'Pickering Nuclear - A', 'Pickering Nuclear - B',
'Bruce Power Site '], dtype=object)
(2) Why is Pickering Nuclear divided as A & B up to 2018, but later combined? Why it is combined for Estimated public dose for every year?
(3) Why is Bruce Power Site combined for Estimated public dose? (but splitted for the rest of the information)
# Renaming columns to English only:
df_npp.rename(columns={'Year | Année': 'Year', 'NPRI ID | ID INRP': 'NPRI ID','Company Name | Raison Sociale':'Company Name',"Facility Name | Nom de l'installation":'Facility Name', 'City | Ville':'City', 'CSD | SDR':'CSD','CA or CMA | AR ou RMR':'CA or CMA', 'Economic Region | Région économique':'Economic Region','Province | Province':'Province', 'Latitude | Latitude':'Latitude', 'Longitude | Longitude':'Longitude','Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)','Units | Unités':'Units', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge','Footnotes | Notes de bas de page':'Footnotes'}, inplace = True)
df_npp.head()
| Year | NPRI ID | Company Name | Facility Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | Substance Name (English) | Substance Name (French) | Nom de substance (Français) | Units | < | Stack Emissions | <.1 | Direct Discharge | Footnotes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Tritium (HTO) | Tritium (Eau tritiée) | Bq | NaN | 4.43E+13 | NaN | 1.56E+14 | NaN |
| 1 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Carbon-14 | Carbone-14 | Bq | NaN | 6.17E+09 | NaN | 6.07E+07 | NaN |
| 2 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Total noble gases | Total des gaz nobles | Bq-MeV | NaN | NRM | NRS | NaN | NRM | NRS | NaN |
| 3 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Iodine-131 | Iode-131 | Bq | NaN | NRM | NRS | NaN | NRM | NRS | NaN |
| 4 | 2021 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Particulate (gross beta/gamma) | Particules (bêta brutes/gamma brutes) | Bq | NaN | 5.11E+05 | NaN | 7.11E+07 | NaN |
I noticed some values are expressed as "LD (Level of Detection) & NRM (Not Required to Monitor)". I will summarize which values are given like that, before replacing them with zeros to be able to plot:
# Stack Emission column first:
df_npp_miss_stack = df_npp[df_npp['Stack Emissions'].isin(['LD', 'NRM | NRS'])]
df_npp_miss_stack[['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
| Year | Facility Name | Substance Name (English) | Stack Emissions | |
|---|---|---|---|---|
| 239 | 2016 | Bruce Power - B | Iodine-131 | LD |
| 484 | 2011 | Darlington Nuclear | Particulate gross alpha | NRM | NRS |
| 437 | 2012 | Darlington Nuclear | Particulate gross alpha | NRM | NRS |
| 455 | 2011 | Gentilly-2 | Iodine-131 | LD |
| 361 | 2013 | Gentilly-2 | Iodine-131 | LD |
| 314 | 2014 | Gentilly-2 | Iodine-131 | LD |
| 267 | 2015 | Gentilly-2 | Iodine-131 | NRM | NRS |
| 220 | 2016 | Gentilly-2 | Iodine-131 | NRM | NRS |
| 173 | 2017 | Gentilly-2 | Iodine-131 | NRM | NRS |
| 126 | 2018 | Gentilly-2 | Iodine-131 | NRM | NRS |
| 85 | 2019 | Gentilly-2 | Iodine-131 | NRM | NRS |
| 44 | 2020 | Gentilly-2 | Iodine-131 | NRM | NRS |
| 3 | 2021 | Gentilly-2 | Iodine-131 | NRM | NRS |
| 266 | 2015 | Gentilly-2 | Total noble gases | NRM | NRS |
| 219 | 2016 | Gentilly-2 | Total noble gases | NRM | NRS |
| 172 | 2017 | Gentilly-2 | Total noble gases | NRM | NRS |
| 125 | 2018 | Gentilly-2 | Total noble gases | NRM | NRS |
| 84 | 2019 | Gentilly-2 | Total noble gases | NRM | NRS |
| 43 | 2020 | Gentilly-2 | Total noble gases | NRM | NRS |
| 2 | 2021 | Gentilly-2 | Total noble gases | NRM | NRS |
| 491 | 2011 | Pickering Nuclear - A | Particulate gross alpha | NRM | NRS |
| 444 | 2012 | Pickering Nuclear - A | Particulate gross alpha | NRM | NRS |
| 497 | 2011 | Pickering Nuclear - B | Particulate gross alpha | NRM | NRS |
| 450 | 2012 | Pickering Nuclear - B | Particulate gross alpha | NRM | NRS |
| 461 | 2011 | Point Lepreau Generating Station | Iodine-131 | NRM | NRS |
| 414 | 2012 | Point Lepreau Generating Station | Iodine-131 | NRM | NRS |
| 367 | 2013 | Point Lepreau Generating Station | Iodine-131 | NRM | NRS |
| 320 | 2014 | Point Lepreau Generating Station | Iodine-131 | NRM | NRS |
| 462 | 2011 | Point Lepreau Generating Station | Particulate (gross beta/gamma) | NRM | NRS |
| 415 | 2012 | Point Lepreau Generating Station | Particulate (gross beta/gamma) | NRM | NRS |
| 368 | 2013 | Point Lepreau Generating Station | Particulate (gross beta/gamma) | NRM | NRS |
| 321 | 2014 | Point Lepreau Generating Station | Particulate (gross beta/gamma) | NRM | NRS |
| 463 | 2011 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 416 | 2012 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 369 | 2013 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 322 | 2014 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 275 | 2015 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 228 | 2016 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 181 | 2017 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 134 | 2018 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 93 | 2019 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 52 | 2020 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 11 | 2021 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS |
| 460 | 2011 | Point Lepreau Generating Station | Total noble gases | NRM | NRS |
(4) Summary of Missing Data (LD / NRM) for Stack Emissions:
# Direct Discharge column next:
df_npp_miss_discharge = df_npp[df_npp['Direct Discharge'].isin(['LD', 'NRM | NRS'])]
df_npp_miss_discharge[['Year','Facility Name', 'Substance Name (English)', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
| Year | Facility Name | Substance Name (English) | Direct Discharge | |
|---|---|---|---|---|
| 468 | 2011 | Bruce Power - A | Iodine-131 | NRM | NRS |
| 421 | 2012 | Bruce Power - A | Iodine-131 | NRM | NRS |
| 374 | 2013 | Bruce Power - A | Iodine-131 | NRM | NRS |
| 327 | 2014 | Bruce Power - A | Iodine-131 | NRM | NRS |
| 280 | 2015 | Bruce Power - A | Iodine-131 | NRM | NRS |
| ... | ... | ... | ... | ... |
| 178 | 2017 | Point Lepreau Generating Station | Total noble gases | NRM | NRS |
| 131 | 2018 | Point Lepreau Generating Station | Total noble gases | NRM | NRS |
| 90 | 2019 | Point Lepreau Generating Station | Total noble gases | NRM | NRS |
| 49 | 2020 | Point Lepreau Generating Station | Total noble gases | NRM | NRS |
| 8 | 2021 | Point Lepreau Generating Station | Total noble gases | NRM | NRS |
239 rows × 4 columns
(5) Summary of Missing Data (LD / NRM) for Direct Discharge:
Note 1: Estimated Public Dose is missing for all of the Direct Discharge, as it was reported in Stack Emissions "incorporating all major release pathways (emissions and discharges)" according to the footnote.
Note 2: Noble Gases are missing for all of the Direct Discharge, which makes senses as they are not soluble in water.
Note 3: Only reports of Iodine-131 is Point Lepreau 2020, 2021.
# I noticed one value is "0":
df_npp[(df_npp['Stack Emissions'] == '0.00E+00') | (df_npp['Direct Discharge'] == '0.00E+00')]
| Year | NPRI ID | Company Name | Facility Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | Substance Name (English) | Substance Name (French) | Nom de substance (Français) | Units | < | Stack Emissions | <.1 | Direct Discharge | Footnotes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22 | 2021 | 7041 | Bruce Power LP | Bruce Power - B | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3289 | -81.5916 | Iodine-131 | Iode-131 | Bq | NaN | 0.00E+00 | NaN | NRM | NRS | NaN |
(5') Why is Bruce Power - B Iodine 131 report for 2021 "0.00e+00"? Seems unusual considering previous values.
# Combining Bruce Power A & B and Pickering Nuclear A & B for a geographic reference table, so I can delete them from the dataframe I will use for plotting:
df_npp_geography = df_npp[['Facility Name', 'NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude']]
df_npp_geography['Facility Name'].replace('Bruce Power - A', 'Bruce Power Site', inplace=True)
df_npp_geography['Facility Name'].replace('Bruce Power - B', 'Bruce Power Site', inplace=True)
df_npp_geography['Facility Name'].replace('Bruce Power Site ','Bruce Power Site',inplace=True)
df_npp_geography['Facility Name'].replace('Pickering Nuclear - A', 'Pickering Nuclear - A & B', inplace=True)
df_npp_geography['Facility Name'].replace('Pickering Nuclear - B', 'Pickering Nuclear - A & B', inplace=True)
df_npp_geography
| Facility Name | NPRI ID | Company Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Gentilly-2 | 1445 | Hydro-Québec | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 |
| 1 | Gentilly-2 | 1445 | Hydro-Québec | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 |
| 2 | Gentilly-2 | 1445 | Hydro-Québec | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 |
| 3 | Gentilly-2 | 1445 | Hydro-Québec | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 |
| 4 | Gentilly-2 | 1445 | Hydro-Québec | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 494 | Pickering Nuclear - A & B | 3161 | Ontario Power Generation Inc. | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 |
| 495 | Pickering Nuclear - A & B | 3161 | Ontario Power Generation Inc. | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 |
| 496 | Pickering Nuclear - A & B | 3161 | Ontario Power Generation Inc. | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 |
| 497 | Pickering Nuclear - A & B | 3161 | Ontario Power Generation Inc. | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 |
| 498 | Pickering Nuclear - A & B | 3163 | Ontario Power Generation Inc. | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 |
499 rows × 10 columns
# Cleaning the geography dataframe:
df_npp_geography.drop_duplicates(inplace=True)
df_npp_geography = df_npp_geography.reset_index(drop=True)
df_npp_geography
| Facility Name | NPRI ID | Company Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Gentilly-2 | 1445 | Hydro-Québec | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 |
| 1 | Point Lepreau Generating Station | 1710 | New Brunswick Power Corporation | Maces Bay | Musquash | Saint John | Saint John--St. Stephen | NB | 45.0690 | -66.4556 |
| 2 | Bruce Power Site | 7041 | Bruce Power LP | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3289 | -81.5916 |
| 3 | Darlington Nuclear | 3163 | Ontario Power Generation Inc. | Bowmanville | Clarington | Oshawa | Toronto | ON | 43.8681 | -78.7250 |
| 4 | Pickering Nuclear - A & B | 3161 | Ontario Power Generation Inc. | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 |
| 5 | Pickering Nuclear - A & B | 3163 | Ontario Power Generation Inc. | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 |
df_npp_geography.at[5, 'NPRI ID'] = 3161
df_npp_geography.drop_duplicates(inplace=True)
df_npp_geography = df_npp_geography.reset_index(drop=True)
df_npp_geography
| Facility Name | NPRI ID | Company Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Gentilly-2 | 1445 | Hydro-Québec | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 |
| 1 | Point Lepreau Generating Station | 1710 | New Brunswick Power Corporation | Maces Bay | Musquash | Saint John | Saint John--St. Stephen | NB | 45.0690 | -66.4556 |
| 2 | Bruce Power Site | 7041 | Bruce Power LP | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3289 | -81.5916 |
| 3 | Darlington Nuclear | 3163 | Ontario Power Generation Inc. | Bowmanville | Clarington | Oshawa | Toronto | ON | 43.8681 | -78.7250 |
| 4 | Pickering Nuclear - A & B | 3161 | Ontario Power Generation Inc. | Pickering | Pickering | Toronto | Toronto | ON | 43.8104 | -79.0676 |
# Cleaning the LD (Level of detection) & NRM (Not required to monitor) values so I can convert the columns into numeric for later plotting:
df_npp['Stack Emissions'].replace('LD', 0, inplace=True)
df_npp['Stack Emissions'].replace('NRM | NRS', 0, inplace=True)
df_npp['Direct Discharge'].replace('LD', 0, inplace=True)
df_npp['Direct Discharge'].replace('NRM | NRS', 0, inplace=True)
# At line 28 there is a problem with a figure. It says 1,2e12 instead of 1.2e12. I will correct it:
df_npp.at[28, 'Stack Emissions'] = 1.2e12
# Converted columns to numeric for plotting:
df_npp['Stack Emissions'] = pd.to_numeric(df_npp['Stack Emissions'])
df_npp['Direct Discharge'] = pd.to_numeric(df_npp['Direct Discharge'])
# Combining Pickering Nuclear A & Pickering Nuclear B so I can plot every year (they are reported combined after 208). Also correcting the name of one data point of Bruce Power:
df_npp['Facility Name'].replace('Bruce Power Site ','Bruce Power Site',inplace=True)
df_npp['Facility Name'].replace('Pickering Nuclear - A', 'Pickering Nuclear - A & B', inplace=True)
df_npp['Facility Name'].replace('Pickering Nuclear - B', 'Pickering Nuclear - A & B', inplace=True)
df_npp['Facility Name'].unique()
array(['Gentilly-2', 'Point Lepreau Generating Station',
'Bruce Power - A', 'Bruce Power - B', 'Bruce Power Site',
'Darlington Nuclear', 'Pickering Nuclear - A & B'], dtype=object)
df_npp.drop(columns=['NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude', 'Substance Name (French) | Nom de substance (Français)', '<', '<.1', 'Footnotes'], inplace=True)
df_npp.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 0 | 2021 | Gentilly-2 | Tritium (HTO) | Bq | 4.430000e+13 | 1.560000e+14 |
| 1 | 2021 | Gentilly-2 | Carbon-14 | Bq | 6.170000e+09 | 6.070000e+07 |
| 2 | 2021 | Gentilly-2 | Total noble gases | Bq-MeV | 0.000000e+00 | 0.000000e+00 |
| 3 | 2021 | Gentilly-2 | Iodine-131 | Bq | 0.000000e+00 | 0.000000e+00 |
| 4 | 2021 | Gentilly-2 | Particulate (gross beta/gamma) | Bq | 5.110000e+05 | 7.110000e+07 |
# Aggregating Pickering A/B to be able to have a same value accross the years (because in recent years they only report the 'combined' site):
df_npp = df_npp.groupby(['Year', 'Facility Name', 'Substance Name (English)', 'Units'],as_index=False).agg({'Stack Emissions': 'sum', 'Direct Discharge': 'sum'})
df_npp.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 0 | 2011 | Bruce Power - A | Carbon-14 | Bq | 1.360000e+12 | 1.700000e+09 |
| 1 | 2011 | Bruce Power - A | Iodine-131 | Bq | 3.580000e+07 | 0.000000e+00 |
| 2 | 2011 | Bruce Power - A | Particulate (gross beta/gamma) | Bq | 7.060000e+06 | 6.290000e+08 |
| 3 | 2011 | Bruce Power - A | Particulate gross alpha | Bq | 5.990000e+05 | 1.010000e+06 |
| 4 | 2011 | Bruce Power - A | Total noble gases | Bq-MeV | 6.680000e+13 | 0.000000e+00 |
# I'm saving the clean dataframe to do a dashboard in Tableau.
df_npp.to_csv(".\Datasets\df_npp.csv", index=True, header=True)
df_npp['Substance Name (English)'].unique()
array(['Carbon-14', 'Iodine-131', 'Particulate (gross beta/gamma)',
'Particulate gross alpha', 'Total noble gases', 'Tritium (HTO)',
'Estimated public dose (see footnote)', 'Elemental Tritium (HT)'],
dtype=object)
df_npp_epd = df_npp[df_npp['Substance Name (English)'] == 'Estimated public dose (see footnote)']
df_npp_epd.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 12 | 2011 | Bruce Power Site | Estimated public dose (see footnote) | mSv/a | 0.0011 | 0.0 |
| 15 | 2011 | Darlington Nuclear | Estimated public dose (see footnote) | mSv/a | 0.0006 | 0.0 |
| 22 | 2011 | Gentilly-2 | Estimated public dose (see footnote) | mSv/a | 0.0015 | 0.0 |
| 28 | 2011 | Pickering Nuclear - A & B | Estimated public dose (see footnote) | mSv/a | 0.0009 | 0.0 |
| 35 | 2011 | Point Lepreau Generating Station | Estimated public dose (see footnote) | mSv/a | 0.0003 | 0.0 |
# Estimated public dose is calculated incorporating all major release pathways (emissions and discharges)
plt.figure(figsize=(16,6))
year = df_npp_epd['Year'].unique()
for facility in df_npp_epd['Facility Name'].unique():
plt.plot(year, df_npp_epd[df_npp_epd['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Estimated public dose [mSv/a]', size=12)
plt.legend(df_npp_epd['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(6) Why does Gentilly have a spike in 2012/2013 & another in 2018, years after decommissioning started (2012)?
df_npp_tritium = df_npp[df_npp['Substance Name (English)'] == 'Tritium (HTO)']
df_npp_tritium.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 5 | 2011 | Bruce Power - A | Tritium (HTO) | Bq | 6.000000e+14 | 2.950000e+14 |
| 11 | 2011 | Bruce Power - B | Tritium (HTO) | Bq | 7.170000e+14 | 5.100000e+14 |
| 20 | 2011 | Darlington Nuclear | Tritium (HTO) | Bq | 1.400000e+14 | 1.100000e+14 |
| 26 | 2011 | Gentilly-2 | Tritium (HTO) | Bq | 1.900000e+14 | 2.440000e+14 |
| 33 | 2011 | Pickering Nuclear - A & B | Tritium (HTO) | Bq | 5.500000e+14 | 3.200000e+14 |
plt.figure(figsize=(16,6))
year = df_npp_tritium['Year'].unique()
for facility in df_npp_tritium['Facility Name'].unique():
plt.plot(year, df_npp_tritium[df_npp_tritium['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_tritium['Facility Name'].unique(), loc='lower right')
plt.grid()
plt.show()
(7) Why does Bruce Power - A has peaks in 2014, 2017, & 2021?
(8) Why does Bruce Power - B has a peak in 2017?
plt.figure(figsize=(16,6))
year = df_npp_tritium['Year'].unique()
for facility in df_npp_tritium['Facility Name'].unique():
plt.plot(year, df_npp_tritium[df_npp_tritium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_tritium['Facility Name'].unique(), loc='upper center')
plt.grid()
plt.show()
(9) Why does Bruce Power - B has a peak in 2012 & a general increasing trend?
(10) Why does Point Lepreau has a peak in 2012?
(11) Why does Darlington has a peak in 2017?
df_npp_carbon = df_npp[df_npp['Substance Name (English)'] == 'Carbon-14']
df_npp_carbon.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 0 | 2011 | Bruce Power - A | Carbon-14 | Bq | 1.360000e+12 | 1.700000e+09 |
| 6 | 2011 | Bruce Power - B | Carbon-14 | Bq | 1.440000e+12 | 2.820000e+09 |
| 13 | 2011 | Darlington Nuclear | Carbon-14 | Bq | 1.000000e+12 | 1.900000e+09 |
| 21 | 2011 | Gentilly-2 | Carbon-14 | Bq | 2.710000e+11 | 1.880000e+10 |
| 27 | 2011 | Pickering Nuclear - A & B | Carbon-14 | Bq | 1.770000e+12 | 2.200000e+09 |
plt.figure(figsize=(16,6))
year = df_npp_carbon['Year'].unique()
for facility in df_npp_carbon['Facility Name'].unique():
plt.plot(year, df_npp_carbon[df_npp_carbon['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Carbon-14 - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_carbon['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(12) Why does Pickering has a spike leading up to 2018?
(13) Why does Bruce Power - A has peaks in 2013 & 2015?
plt.figure(figsize=(16,6))
year = df_npp_carbon['Year'].unique()
for facility in df_npp_carbon['Facility Name'].unique():
plt.plot(year, df_npp_carbon[df_npp_carbon['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Carbon-14 - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_carbon['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(14) Why does Genitilly-2 has an increased amount reported between 2014 and 2017, with peaks in 2015 & 2017?
# I will plot without it, to see the rest without it:
df_npp_carbon2 = df_npp_carbon[df_npp_carbon['Facility Name'] != 'Gentilly-2']
plt.figure(figsize=(16,6))
year = df_npp_carbon2['Year'].unique()
for facility in df_npp_carbon2['Facility Name'].unique():
plt.plot(year, df_npp_carbon2[df_npp_carbon2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Carbon-14 - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_carbon2['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(15) Why does Point Lepreau has peaks in 2012, 2015, and 2019?
(16) Why does Bruce Power - B has a spike leading up to 2015?
(17) Why does Darlington has peaks in 2012 & 2015?
df_npp_iodine = df_npp[df_npp['Substance Name (English)'] == 'Iodine-131']
df_npp_iodine.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 1 | 2011 | Bruce Power - A | Iodine-131 | Bq | 35800000.0 | 0.0 |
| 7 | 2011 | Bruce Power - B | Iodine-131 | Bq | 41900000.0 | 0.0 |
| 16 | 2011 | Darlington Nuclear | Iodine-131 | Bq | 150000000.0 | 0.0 |
| 23 | 2011 | Gentilly-2 | Iodine-131 | Bq | 0.0 | 0.0 |
| 29 | 2011 | Pickering Nuclear - A & B | Iodine-131 | Bq | 23800000.0 | 0.0 |
plt.figure(figsize=(16,6))
year = df_npp_iodine['Year'].unique()
for facility in df_npp_iodine['Facility Name'].unique():
plt.plot(year, df_npp_iodine[df_npp_iodine['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-131 - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_iodine['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(18) Why does Bruce Power - A has peaks in 2012 & 2014?
(19) Why does Point Leprau increased so much in 2021?
plt.figure(figsize=(16,6))
year = df_npp_iodine['Year'].unique()
for facility in df_npp_iodine['Facility Name'].unique():
plt.plot(year, df_npp_iodine[df_npp_iodine['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-131 - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_iodine['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(20) Why is Point Leprau the only one to report values & only in 2020 & 2021?
df_npp_beta = df_npp[df_npp['Substance Name (English)'] == 'Particulate (gross beta/gamma)']
df_npp_beta.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 2 | 2011 | Bruce Power - A | Particulate (gross beta/gamma) | Bq | 7060000.0 | 6.290000e+08 |
| 8 | 2011 | Bruce Power - B | Particulate (gross beta/gamma) | Bq | 50700000.0 | 2.380000e+09 |
| 17 | 2011 | Darlington Nuclear | Particulate (gross beta/gamma) | Bq | 40000000.0 | 3.100000e+10 |
| 24 | 2011 | Gentilly-2 | Particulate (gross beta/gamma) | Bq | 913000.0 | 5.340000e+09 |
| 30 | 2011 | Pickering Nuclear - A & B | Particulate (gross beta/gamma) | Bq | 11800000.0 | 1.910000e+10 |
plt.figure(figsize=(16,6))
year = df_npp_beta['Year'].unique()
for facility in df_npp_beta['Facility Name'].unique():
plt.plot(year, df_npp_beta[df_npp_beta['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate (gross beta/gamma) - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_beta['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(21) Why does Pickering has a peak in 2017?
plt.figure(figsize=(16,6))
year = df_npp_beta['Year'].unique()
for facility in df_npp_beta['Facility Name'].unique():
plt.plot(year, df_npp_beta[df_npp_beta['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate (gross beta/gamma) - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_beta['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(22) Why does Pickering has a spike in 2020?
# I'll plot without Pickering to have a better look at the others:
df_npp_beta2 = df_npp_beta[df_npp_beta['Facility Name'] != 'Pickering Nuclear - A & B']
plt.figure(figsize=(16,6))
year = df_npp_beta2['Year'].unique()
for facility in df_npp_beta2['Facility Name'].unique():
plt.plot(year, df_npp_beta2[df_npp_beta2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate (gross beta/gamma) - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_beta2['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(23) Why does Darlington has an increased amount released in 2015 & 2016?
# I'll plot without Darlington now:
df_npp_beta3 = df_npp_beta2[df_npp_beta2['Facility Name'] != 'Darlington Nuclear']
plt.figure(figsize=(16,6))
year = df_npp_beta3['Year'].unique()
for facility in df_npp_beta3['Facility Name'].unique():
plt.plot(year, df_npp_beta3[df_npp_beta3['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate (gross beta/gamma) - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_beta3['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
df_npp_alpha = df_npp[df_npp['Substance Name (English)'] == 'Particulate gross alpha']
df_npp_alpha.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 3 | 2011 | Bruce Power - A | Particulate gross alpha | Bq | 599000.0 | 1010000.0 |
| 9 | 2011 | Bruce Power - B | Particulate gross alpha | Bq | 17800000.0 | 1480000.0 |
| 18 | 2011 | Darlington Nuclear | Particulate gross alpha | Bq | 0.0 | 1100000.0 |
| 31 | 2011 | Pickering Nuclear - A & B | Particulate gross alpha | Bq | 0.0 | 48000000.0 |
| 38 | 2011 | Point Lepreau Generating Station | Particulate gross alpha | Bq | 0.0 | 5800000.0 |
plt.figure(figsize=(16,6))
year = df_npp_alpha['Year'].unique()
for facility in df_npp_alpha['Facility Name'].unique():
plt.plot(year, df_npp_alpha[df_npp_alpha['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_alpha['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(24) Why does Bruce Power - B has a peak in 2011?
(25) Why does Darlington has a spike between 2013 & 2016?
plt.figure(figsize=(16,6))
year = df_npp_alpha['Year'].unique()
for facility in df_npp_alpha['Facility Name'].unique():
plt.plot(year, df_npp_alpha[df_npp_alpha['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_alpha['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(26) Why does Point Lepreau has a peak in 2014?
(27) Why does Pickering has a peak in 2011?
(28) Gentilly doesn't report anything for Particulate Gross Alpha (Emissions & Discharge).
df_npp_ht = df_npp[df_npp['Substance Name (English)'] == 'Elemental Tritium (HT)']
df_npp_ht.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 14 | 2011 | Darlington Nuclear | Elemental Tritium (HT) | Bq | 8.800000e+13 | 0.0 |
| 55 | 2012 | Darlington Nuclear | Elemental Tritium (HT) | Bq | 2.600000e+13 | 0.0 |
| 96 | 2013 | Darlington Nuclear | Elemental Tritium (HT) | Bq | 1.800000e+13 | 0.0 |
| 137 | 2014 | Darlington Nuclear | Elemental Tritium (HT) | Bq | 5.200000e+13 | 0.0 |
| 178 | 2015 | Darlington Nuclear | Elemental Tritium (HT) | Bq | 1.700000e+13 | 0.0 |
plt.figure(figsize=(16,6))
year = df_npp_ht['Year'].unique()
for facility in df_npp_ht['Facility Name'].unique():
plt.plot(year, df_npp_ht[df_npp_ht['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Elemental Tritium (HT) - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_ht['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(29) Why does Darlington has peaks in 2011, 2014, & 2017?
(30) Why is Darlington is the only one that reports this (& only Stack Emissions)?
df_npp_noble = df_npp[df_npp['Substance Name (English)'] == 'Total noble gases']
df_npp_noble.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 4 | 2011 | Bruce Power - A | Total noble gases | Bq-MeV | 6.680000e+13 | 0.0 |
| 10 | 2011 | Bruce Power - B | Total noble gases | Bq-MeV | 3.640000e+12 | 0.0 |
| 19 | 2011 | Darlington Nuclear | Total noble gases | Bq-MeV | 2.200000e+13 | 0.0 |
| 25 | 2011 | Gentilly-2 | Total noble gases | Bq-MeV | 1.160000e+11 | 0.0 |
| 32 | 2011 | Pickering Nuclear - A & B | Total noble gases | Bq-MeV | 1.830000e+14 | 0.0 |
plt.figure(figsize=(16,6))
year = df_npp_noble['Year'].unique()
for facility in df_npp_noble['Facility Name'].unique():
plt.plot(year, df_npp_noble[df_npp_noble['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Total noble gases - Stack Emissions [Bq-MeV]', size=12)
plt.legend(df_npp_noble['Facility Name'].unique(), loc='upper center')
plt.grid()
plt.show()
(31) Why does Pickering Nuclear produces so much more than the rest? And has peaks in 2011, 2017, & 2021?
(32) Why does Point Lepreau has a peak in 2016?
plt.figure(figsize=(16,6))
year = df_npp_noble['Year'].unique()
for facility in df_npp_noble['Facility Name'].unique():
plt.plot(year, df_npp_noble[df_npp_noble['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Total noble gases - Direct Discharge [Bq-MeV]', size=12)
plt.legend(df_npp_noble['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
facilities = df_npp['Facility Name'].unique()
for f in facilities:
df = df_npp[df_npp['Facility Name'] == f]
print(f,'\n')
subs = df['Substance Name (English)'].unique()
for s in subs:
df2 = df[df['Substance Name (English)'] == s]
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,8))
fig.subplots_adjust(hspace=0.5)
ax1.plot(df2['Year'], df2['Stack Emissions'], color='green')
ax1.set_title(s + ' - Stack Emissions', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax1.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax1.grid()
ax2.plot(df2['Year'], df2['Direct Discharge'], color='red')
ax2.set_title(s + ' - Direct Discharge', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax2.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax2.grid()
plt.show()
Bruce Power - A
Bruce Power - B
Bruce Power Site
Darlington Nuclear
Gentilly-2
Pickering Nuclear - A & B
Point Lepreau Generating Station
df_npp_2020 = pd.read_csv("./Datasets/2020/Nuclear Power Plants.csv")
df_npp_2020.head()
| _id | Year | Annee | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Region economique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Francais) | Units | Unites | Stack Emissions | Emissions de cheminees | Direct Discharge | Evacuations directes | Footnotes | Notes de bas de page | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2020 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Tritium (HTO) | Tritium (Eau tritiée) | Bq | 8.11E+13 | 1.97E+13 | NaN |
| 1 | 2 | 2020 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Carbon-14 | Carbone-14 | Bq | 8.18E+09 | 4.92E+07 | NaN |
| 2 | 3 | 2020 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Total noble gases | Total des gaz nobles | Bq-MeV | LD | NRM | NRS | <LD = 0 |
| 3 | 4 | 2020 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Iodine-131 | Iode-131 | Bq | LD | NRM | NRS | <LD = 0 |
| 4 | 5 | 2020 | 1445 | Hydro-Québec | Gentilly-2 | Bécancour | NaN | NaN | NaN | QC | 46.3958 | -72.3569 | Particulate (gross beta/gamma) | Particules (bêta brutes/gamma brutes) | Bq | 4.47E+05 | 1.65E+08 | NaN |
# It doesn't have the same columns, so I will keep only the essentials:
df_npp_2020 = df_npp_2020[['Year | Annee', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Emissions de cheminees',
'Direct Discharge | Evacuations directes']]
df_npp_2020.rename(columns={'Year | Annee': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Emissions de cheminees':'Stack Emissions','Direct Discharge | Evacuations directes':'Direct Discharge'}, inplace = True)
df_npp_2020.head()
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2020 | Gentilly-2 | Tritium (HTO) | 8.11E+13 | 1.97E+13 |
| 1 | 2020 | Gentilly-2 | Carbon-14 | 8.18E+09 | 4.92E+07 |
| 2 | 2020 | Gentilly-2 | Total noble gases | LD | NRM | NRS |
| 3 | 2020 | Gentilly-2 | Iodine-131 | LD | NRM | NRS |
| 4 | 2020 | Gentilly-2 | Particulate (gross beta/gamma) | 4.47E+05 | 1.65E+08 |
# I will go back to the original copy of the dataframe & keep only the essentials to compare:
df_npp_2021 = df_npp_0[['Year | Année', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Émissions de cheminées',
'Direct Discharge | Évacuations directes']]
df_npp_2021.rename(columns={'Year | Année': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge'}, inplace = True)
df_npp_2021.head()
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2021 | Gentilly-2 | Tritium (HTO) | 4.43E+13 | 1.56E+14 |
| 1 | 2021 | Gentilly-2 | Carbon-14 | 6.17E+09 | 6.07E+07 |
| 2 | 2021 | Gentilly-2 | Total noble gases | NRM | NRS | NRM | NRS |
| 3 | 2021 | Gentilly-2 | Iodine-131 | NRM | NRS | NRM | NRS |
| 4 | 2021 | Gentilly-2 | Particulate (gross beta/gamma) | 5.11E+05 | 7.11E+07 |
# Now that they have the same columns, I will remove 2021 from the new dataframe and compare the remaining with 2020.
df_npp_2021 = df_npp_2021[df_npp_2021['Year'] != 2021].reset_index(drop = True)
df_npp_2021.head()
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2020 | Gentilly-2 | Tritium (HTO) | 8.11E+13 | 1.97E+13 |
| 1 | 2020 | Gentilly-2 | Carbon-14 | 8.19E+09 | 4.92E+07 |
| 2 | 2020 | Gentilly-2 | Total noble gases | NRM | NRS | NRM | NRS |
| 3 | 2020 | Gentilly-2 | Iodine-131 | NRM | NRS | NRM | NRS |
| 4 | 2020 | Gentilly-2 | Particulate (gross beta/gamma) | 4.47E+05 | 1.65E+08 |
# I will concatenate both dataframes & keep the not duplicates to see the differences. This produces a df with 2021's values first, and 2020's values after:
df = pd.concat([df_npp_2021,df_npp_2020]).drop_duplicates(keep=False)
df
# 61 changes total. Some changes are LD to NRM . And others are numerical changes.
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 1 | 2020 | Gentilly-2 | Carbon-14 | 8.19E+09 | 4.92E+07 |
| 2 | 2020 | Gentilly-2 | Total noble gases | NRM | NRS | NRM | NRS |
| 3 | 2020 | Gentilly-2 | Iodine-131 | NRM | NRS | NRM | NRS |
| 13 | 2020 | Bruce Power - A | Tritium (HTO) | 3.40E+14 | 2.50E+14 |
| 14 | 2020 | Bruce Power - A | Carbon-14 | 1.60E+12 | 1.10E+09 |
| ... | ... | ... | ... | ... | ... |
| 412 | 2011 | Gentilly-2 | Carbon-14 | 2.71E+11 | 1.89E+10 |
| 415 | 2011 | Gentilly-2 | Particulate (gross beta/gamma) | 9.13E+05 | 5.35E+09 |
| 417 | 2011 | Point Lepreau Generating Station | Tritium (HTO) | 4.30E+11 | 3.40E+13 |
| 418 | 2011 | Point Lepreau Generating Station | Carbon-14 | 3.30E+15 | 1.40E+07 |
| 429 | 2011 | Bruce Power - A | Particulate gross alpha | 5.99E+05 | 1.09E+06 |
122 rows × 5 columns
# Looking for LD (2020) to NRM (2021) changes first.
df[(df['Stack Emissions'].isin(['NRM | NRS', 'LD'])) & (df['Direct Discharge'] == 'NRM | NRS')]
# I checked with 'Direct Discharge = NRM/LD', but there wasn't any change there.
# The second condition is to filter out 2 results that are numerical changes in values of 'Direct Discharge', but same 'Stack Emissions'.
# Finally, it's important to see the Index number 63. I will address that below.
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 2 | 2020 | Gentilly-2 | Total noble gases | NRM | NRS | NRM | NRS |
| 3 | 2020 | Gentilly-2 | Iodine-131 | NRM | NRS | NRM | NRS |
| 43 | 2019 | Gentilly-2 | Total noble gases | NRM | NRS | NRM | NRS |
| 44 | 2019 | Gentilly-2 | Iodine-131 | NRM | NRS | NRM | NRS |
| 84 | 2018 | Gentilly-2 | Total noble gases | NRM | NRS | NRM | NRS |
| 85 | 2018 | Gentilly-2 | Iodine-131 | NRM | NRS | NRM | NRS |
| 131 | 2017 | Gentilly-2 | Total noble gases | NRM | NRS | NRM | NRS |
| 132 | 2017 | Gentilly-2 | Iodine-131 | NRM | NRS | NRM | NRS |
| 178 | 2016 | Gentilly-2 | Total noble gases | NRM | NRS | NRM | NRS |
| 179 | 2016 | Gentilly-2 | Iodine-131 | NRM | NRS | NRM | NRS |
| 225 | 2015 | Gentilly-2 | Total noble gases | NRM | NRS | NRM | NRS |
| 226 | 2015 | Gentilly-2 | Iodine-131 | NRM | NRS | NRM | NRS |
| 2 | 2020 | Gentilly-2 | Total noble gases | LD | NRM | NRS |
| 3 | 2020 | Gentilly-2 | Iodine-131 | LD | NRM | NRS |
| 43 | 2019 | Gentilly-2 | Total noble gases | LD | NRM | NRS |
| 44 | 2019 | Gentilly-2 | Iodine-131 | LD | NRM | NRS |
| 63 | 2019 | Bruce Power - B | Iodine-131 | LD | NRM | NRS |
| 84 | 2018 | Gentilly-2 | Total noble gases | LD | NRM | NRS |
| 85 | 2018 | Gentilly-2 | Iodine-131 | LD | NRM | NRS |
| 131 | 2017 | Gentilly-2 | Total noble gases | LD | NRM | NRS |
| 132 | 2017 | Gentilly-2 | Iodine-131 | LD | NRM | NRS |
| 178 | 2016 | Gentilly-2 | Total noble gases | LD | NRM | NRS |
| 179 | 2016 | Gentilly-2 | Iodine-131 | LD | NRM | NRS |
| 225 | 2015 | Gentilly-2 | Total noble gases | LD | NRM | NRS |
| 226 | 2015 | Gentilly-2 | Iodine-131 | LD | NRM | NRS |
(33) In every case, values for 'Stack Emissions' that were 'LD' in the 2020's database, became 'NRM | NRS' in the 2021's. It's only in Gentilly-2 for Iodine-131 & Noble Gases from 2015 to 2020.
# What's going on with index 63 that it didn't appear twice in the table above (for 2021 & then for 2020):
df.loc[[63]]
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 63 | 2019 | Bruce Power - B | Iodine-131 | 4.40E+05 | NRM | NRS |
| 63 | 2019 | Bruce Power - B | Iodine-131 | LD | NRM | NRS |
# Actual changes in: Index Number: 364, 417, 418, 371, 280, 281, 229, 51, 287, 429, 330, 60, 338. (I filtered out the 'LD' & 'NRM' to get this list, 13 total)
df.loc[[364, 417, 418, 371, 280, 281, 229, 51, 287, 429, 330, 60, 338]] # First value: 2021, Second Value: 2020.
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 364 | 2012 | Gentilly-2 | Tritium (HTO) | 2.09E+14 | 3.52E+14 |
| 364 | 2012 | Gentilly-2 | Tritium (HTO) | 2.13E+14 | 3.51E+14 |
| 417 | 2011 | Point Lepreau Generating Station | Tritium (HTO) | 4.50E+13 | 3.40E+13 |
| 417 | 2011 | Point Lepreau Generating Station | Tritium (HTO) | 4.30E+11 | 3.40E+13 |
| 418 | 2011 | Point Lepreau Generating Station | Carbon-14 | 2.80E+10 | 3.80E+07 |
| 418 | 2011 | Point Lepreau Generating Station | Carbon-14 | 3.30E+15 | 1.40E+07 |
| 371 | 2012 | Point Lepreau Generating Station | Carbon-14 | 3.70E+10 | 1.40E+10 |
| 371 | 2012 | Point Lepreau Generating Station | Carbon-14 | 3.70E+10 | 3.80E+10 |
| 280 | 2014 | Point Lepreau Generating Station | Particulate (gross beta/gamma) | NRM | NRS | 1.00E+07 |
| 280 | 2014 | Point Lepreau Generating Station | Particulate (gross beta/gamma) | NRM | NRS | 1.50E+08 |
| 281 | 2014 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS | 8.30E+07 |
| 281 | 2014 | Point Lepreau Generating Station | Particulate gross alpha | NRM | NRS | 8.60E+06 |
| 229 | 2015 | Point Lepreau Generating Station | Tritium (HTO) | 1.40E+14 | 1.40E+14 |
| 229 | 2015 | Point Lepreau Generating Station | Tritium (HTO) | 1.40E+13 | 1.40E+14 |
| 51 | 2019 | Point Lepreau Generating Station | Particulate (gross beta/gamma) | 2.20E+06 | 8.40E+07 |
| 51 | 2019 | Point Lepreau Generating Station | Particulate (gross beta/gamma) | 1.14E+08 | 8.40E+07 |
| 287 | 2014 | Bruce Power - A | Particulate (gross beta/gamma) | 3.13E+06 | 9.57E+08 |
| 287 | 2014 | Bruce Power - A | Particulate (gross beta/gamma) | 3.13E+06 | 1.02E+09 |
| 429 | 2011 | Bruce Power - A | Particulate gross alpha | 5.99E+05 | 1.01E+06 |
| 429 | 2011 | Bruce Power - A | Particulate gross alpha | 5.99E+05 | 1.09E+06 |
| 330 | 2013 | Bruce Power - A | Tritium (HTO) | 5.09E+14 | 1.96E+14 |
| 330 | 2013 | Bruce Power - A | Tritium (HTO) | 5.04E+14 | 1.96E+14 |
| 60 | 2019 | Bruce Power - B | Tritium (HTO) | 3.29E+14 | 8.82E+14 |
| 60 | 2019 | Bruce Power - B | Tritium (HTO) | 3.29E+14 | 8.84E+14 |
| 338 | 2013 | Bruce Power - B | Total noble gases | 3.71E+12 | NRM | NRS |
| 338 | 2013 | Bruce Power - B | Total noble gases | 5.25E+13 | NRM | NRS |
(35) Why did this 13 set of values changed between reports? Why wasn't it addressed somewhere?
df_npf = pd.read_csv("./Datasets/Nuclear Processing Facilities.csv")
df_npf
| Year | Année | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Région économique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Français) | Units | Unités | Stack Emissions | Émissions de cheminées | Direct Discharge | Évacuations directes | Footnotes | Notes de bas de page | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Uranium | Uranium | kg | 3.2 | 2.2 | NaN |
| 1 | 2021 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Radium-226 | Radium-226 | MBq | NRM | NRS | 2.2 | NaN |
| 2 | 2021 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.009 | NRM | NRS | Estimated public dose is calculated incorporat... |
| 3 | 2021 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Uranium | Uranium | kg | 39 | NRM | NRS | NaN |
| 4 | 2021 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.072 | NRM | NRS | Site 1, Estimated public dose is calculated in... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 189 | 2013 | 2247.0 | Nordion (Canada) Inc. | Nordion - Ottawa | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 | Iodine-131 | Iode-131 | GBq | 3.90E-01 | NRM | NRS | NaN |
| 190 | 2013 | 2247.0 | Nordion (Canada) Inc. | Nordion - Ottawa | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 | Xenon-133 | Xénon-133 | GBq | 3.07E+04 | NRM | NRS | NaN |
| 191 | 2013 | 2247.0 | Nordion (Canada) Inc. | Nordion - Ottawa | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 | Xenon-135 | Xénon-135 | GBq | 2.82E+04 | NRM | NRS | NaN |
| 192 | 2013 | 2247.0 | Nordion (Canada) Inc. | Nordion - Ottawa | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 | Xenon-135m | Xénon-135m | GBq | 4.34E+04 | NRM | NRS | NaN |
| 193 | 2013 | 2247.0 | Nordion (Canada) Inc. | Nordion - Ottawa | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.0195 | NRM | NRS | Estimated public dose is calculated incorporat... |
194 rows × 17 columns
(1) Why does the Data start in 2013? Can we get older data?
# I'm creating a copy because I will need it later.
df_npf_0 = df_npf.copy()
df_npf["Facility Name | Nom de l'installation"].unique()
array(['Blind River Refinery', 'Port Hope Conversion Facility',
'Cameco Fuel Manufacturing', 'BWXT - Toronto',
'BWXT - Peterborough', 'SRBT', 'Nordion - Ottawa'], dtype=object)
# Renaming columns to English only:
df_npf.rename(columns={'Year | Année': 'Year', 'NPRI ID | ID INRP': 'NPRI ID','Company Name | Raison Sociale':'Company Name',"Facility Name | Nom de l'installation":'Facility Name', 'City | Ville':'City', 'CSD | SDR':'CSD','CA or CMA | AR ou RMR':'CA or CMA', 'Economic Region | Région économique':'Economic Region','Province | Province':'Province', 'Latitude | Latitude':'Latitude', 'Longitude | Longitude':'Longitude','Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)','Units | Unités':'Units', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge','Footnotes | Notes de bas de page':'Footnotes'}, inplace = True)
df_npf.head()
| Year | NPRI ID | Company Name | Facility Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | Substance Name (English) | Substance Name (French) | Nom de substance (Français) | Units | Stack Emissions | Direct Discharge | Footnotes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Uranium | Uranium | kg | 3.2 | 2.2 | NaN |
| 1 | 2021 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Radium-226 | Radium-226 | MBq | NRM | NRS | 2.2 | NaN |
| 2 | 2021 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.009 | NRM | NRS | Estimated public dose is calculated incorporat... |
| 3 | 2021 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Uranium | Uranium | kg | 39 | NRM | NRS | NaN |
| 4 | 2021 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.072 | NRM | NRS | Site 1, Estimated public dose is calculated in... |
I noticed some values are expressed as "NRM (Not Required to Monitor)". I will summarize which values are given like that, before replacing them with zeros to be able to plot.
# Stack Emission column first:
df_npf_miss_stack = df_npf[df_npf['Stack Emissions'] == 'NRM | NRS']
df_npf_miss_stack[['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
| Year | Facility Name | Substance Name (English) | Stack Emissions | |
|---|---|---|---|---|
| 174 | 2013 | Blind River Refinery | Radium-226 | NRM | NRS |
| 153 | 2014 | Blind River Refinery | Radium-226 | NRM | NRS |
| 132 | 2015 | Blind River Refinery | Radium-226 | NRM | NRS |
| 111 | 2016 | Blind River Refinery | Radium-226 | NRM | NRS |
| 89 | 2017 | Blind River Refinery | Radium-226 | NRM | NRS |
| 67 | 2018 | Blind River Refinery | Radium-226 | NRM | NRS |
| 45 | 2019 | Blind River Refinery | Radium-226 | NRM | NRS |
| 23 | 2020 | Blind River Refinery | Radium-226 | NRM | NRS |
| 1 | 2021 | Blind River Refinery | Radium-226 | NRM | NRS |
(2) Summary of Missing Data (NRM) for Stack Emissions:
# Direct Discharge column next:
df_npf_miss_discharge = df_npf[df_npf['Direct Discharge'] == 'NRM | NRS']
df_npf_miss_discharge[['Year','Facility Name', 'Substance Name (English)', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
| Year | Facility Name | Substance Name (English) | Direct Discharge | |
|---|---|---|---|---|
| 183 | 2013 | BWXT - Peterborough | Estimated public dose (see footnote) | NRM | NRS |
| 162 | 2014 | BWXT - Peterborough | Estimated public dose (see footnote) | NRM | NRS |
| 141 | 2015 | BWXT - Peterborough | Estimated public dose (see footnote) | NRM | NRS |
| 120 | 2016 | BWXT - Peterborough | Estimated public dose (see footnote) | NRM | NRS |
| 99 | 2017 | BWXT - Peterborough | Estimated public dose (see footnote) | NRM | NRS |
| ... | ... | ... | ... | ... |
| 100 | 2017 | SRBT | Tritium (HTO) | NRM | NRS |
| 78 | 2018 | SRBT | Tritium (HTO) | NRM | NRS |
| 56 | 2019 | SRBT | Tritium (HTO) | NRM | NRS |
| 34 | 2020 | SRBT | Tritium (HTO) | NRM | NRS |
| 12 | 2021 | SRBT | Tritium (HTO) | NRM | NRS |
176 rows × 4 columns
(3) Summary of Missing Data (NRM) for Direct Discharge:
Note 1: Estimated Public Dose is missing for all of the Direct Discharge, as it was reported in Stack Emissions "incorporating all major release pathways (emissions and discharges)" according to the footnote.
Note 2: Blind River Refinery is the only facility to report Uranium Direct Discharge values; & SRBT & Nordion - Ottawa don't report Uranium at all.
Note 3: Blind River Refinery is the only facility to report Direct Discharge values.
# I noticed some values are "0" or "0.00E+00".
df_npf[df_npf['Stack Emissions'].isin(['0', '0.00E+00'])][['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 183 | 2013 | BWXT - Peterborough | Estimated public dose (see footnote) | 0 | NRM | NRS |
| 162 | 2014 | BWXT - Peterborough | Estimated public dose (see footnote) | 0 | NRM | NRS |
| 141 | 2015 | BWXT - Peterborough | Estimated public dose (see footnote) | 0 | NRM | NRS |
| 120 | 2016 | BWXT - Peterborough | Estimated public dose (see footnote) | 0 | NRM | NRS |
| 99 | 2017 | BWXT - Peterborough | Estimated public dose (see footnote) | 0 | NRM | NRS |
| 77 | 2018 | BWXT - Peterborough | Estimated public dose (see footnote) | 0 | NRM | NRS |
| 33 | 2020 | BWXT - Peterborough | Estimated public dose (see footnote) | 0 | NRM | NRS |
| 11 | 2021 | BWXT - Peterborough | Estimated public dose (see footnote) | 0 | NRM | NRS |
| 10 | 2021 | BWXT - Peterborough | Uranium | 0 | NRM | NRS |
| 37 | 2020 | Nordion - Ottawa | Cobalt-60 | 0.00E+00 | NRM | NRS |
| 82 | 2018 | Nordion - Ottawa | Iodine-125 | 0.00E+00 | NRM | NRS |
| 60 | 2019 | Nordion - Ottawa | Iodine-125 | 0.00E+00 | NRM | NRS |
| 38 | 2020 | Nordion - Ottawa | Iodine-125 | 0.00E+00 | NRM | NRS |
| 16 | 2021 | Nordion - Ottawa | Iodine-125 | 0.00E+00 | NRM | NRS |
| 61 | 2019 | Nordion - Ottawa | Iodine-131 | 0.00E+00 | NRM | NRS |
| 39 | 2020 | Nordion - Ottawa | Iodine-131 | 0.00E+00 | NRM | NRS |
| 17 | 2021 | Nordion - Ottawa | Iodine-131 | 0.00E+00 | NRM | NRS |
| 106 | 2017 | Nordion - Ottawa | Xenon-133 | 0.00E+00 | NRM | NRS |
| 84 | 2018 | Nordion - Ottawa | Xenon-133 | 0.00E+00 | NRM | NRS |
| 62 | 2019 | Nordion - Ottawa | Xenon-133 | 0.00E+00 | NRM | NRS |
| 40 | 2020 | Nordion - Ottawa | Xenon-133 | 0.00E+00 | NRM | NRS |
| 18 | 2021 | Nordion - Ottawa | Xenon-133 | 0.00E+00 | NRM | NRS |
| 107 | 2017 | Nordion - Ottawa | Xenon-135 | 0.00E+00 | NRM | NRS |
| 85 | 2018 | Nordion - Ottawa | Xenon-135 | 0.00E+00 | NRM | NRS |
| 63 | 2019 | Nordion - Ottawa | Xenon-135 | 0.00E+00 | NRM | NRS |
| 41 | 2020 | Nordion - Ottawa | Xenon-135 | 0.00E+00 | NRM | NRS |
| 19 | 2021 | Nordion - Ottawa | Xenon-135 | 0.00E+00 | NRM | NRS |
| 108 | 2017 | Nordion - Ottawa | Xenon-135m | 0.00E+00 | NRM | NRS |
| 86 | 2018 | Nordion - Ottawa | Xenon-135m | 0.00E+00 | NRM | NRS |
| 64 | 2019 | Nordion - Ottawa | Xenon-135m | 0.00E+00 | NRM | NRS |
| 42 | 2020 | Nordion - Ottawa | Xenon-135m | 0.00E+00 | NRM | NRS |
| 20 | 2021 | Nordion - Ottawa | Xenon-135m | 0.00E+00 | NRM | NRS |
(4) Summary of Zero Values (all Stack Emissions):
df_npf_geography = df_npf[['Facility Name', 'NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude']]
df_npf_geography
| Facility Name | NPRI ID | Company Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Blind River Refinery | 3657.0 | Cameco | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 |
| 1 | Blind River Refinery | 3657.0 | Cameco | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 |
| 2 | Blind River Refinery | 3657.0 | Cameco | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 |
| 3 | Port Hope Conversion Facility | 1145.0 | Cameco | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 |
| 4 | Port Hope Conversion Facility | 1145.0 | Cameco | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 189 | Nordion - Ottawa | 2247.0 | Nordion (Canada) Inc. | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 |
| 190 | Nordion - Ottawa | 2247.0 | Nordion (Canada) Inc. | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 |
| 191 | Nordion - Ottawa | 2247.0 | Nordion (Canada) Inc. | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 |
| 192 | Nordion - Ottawa | 2247.0 | Nordion (Canada) Inc. | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 |
| 193 | Nordion - Ottawa | 2247.0 | Nordion (Canada) Inc. | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 |
194 rows × 10 columns
# Cleaning the geography dataframe:
df_npf_geography.drop_duplicates(inplace=True)
df_npf_geography = df_npf_geography.reset_index(drop=True)
df_npf_geography
| Facility Name | NPRI ID | Company Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Blind River Refinery | 3657.0 | Cameco | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 |
| 1 | Port Hope Conversion Facility | 1145.0 | Cameco | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 |
| 2 | Cameco Fuel Manufacturing | NaN | Cameco | Port Hope | NaN | NaN | NaN | ON | 43.9540 | -78.2748 |
| 3 | BWXT - Toronto | NaN | BWXT Nuclear Energy Canada | Toronto | NaN | NaN | NaN | ON | 43.6680 | -79.4468 |
| 4 | BWXT - Peterborough | NaN | BWXT Nuclear Energy Canada | Peterborough | NaN | NaN | NaN | ON | 44.2961 | -78.3337 |
| 5 | SRBT | NaN | SRB Technologies (Canada) Inc. | Pembroke | NaN | NaN | NaN | ON | 45.8054 | -77.1180 |
| 6 | Nordion - Ottawa | 2247.0 | Nordion (Canada) Inc. | Ottawa | NaN | NaN | NaN | ON | 45.3408 | -75.9179 |
# Cleaning the NRM (Not required to monitor) values so I can convert the columns into numeric for later plotting:
df_npf['Stack Emissions'].replace('NRM | NRS', 0, inplace=True)
df_npf['Direct Discharge'].replace('NRM | NRS', 0, inplace=True)
# Converted columns to numeric for plotting:
df_npf['Stack Emissions'] = pd.to_numeric(df_npf['Stack Emissions'])
df_npf['Direct Discharge'] = pd.to_numeric(df_npf['Direct Discharge'])
# I noticed in the footnotes of Estimated Public Dose for Port Hope Conversion Facility it details if it's "Site 1" or "Site 2":
df_npf[(df_npf['Facility Name'] == 'Port Hope Conversion Facility') & (df_npf['Substance Name (English)'] == 'Estimated public dose (see footnote)')].sort_values(by = ['Footnotes', 'Year'])
| Year | NPRI ID | Company Name | Facility Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | Substance Name (English) | Substance Name (French) | Nom de substance (Français) | Units | Stack Emissions | Direct Discharge | Footnotes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 177 | 2013 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.021 | 0.0 | Estimated public dose is calculated incorporat... |
| 156 | 2014 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.012 | 0.0 | Estimated public dose is calculated incorporat... |
| 135 | 2015 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.006 | 0.0 | Estimated public dose is calculated incorporat... |
| 114 | 2016 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.020 | 0.0 | Estimated public dose is calculated incorporat... |
| 92 | 2017 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.110 | 0.0 | Site 1, Estimated public dose is calculated in... |
| 70 | 2018 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.142 | 0.0 | Site 1, Estimated public dose is calculated in... |
| 48 | 2019 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.080 | 0.0 | Site 1, Estimated public dose is calculated in... |
| 26 | 2020 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.129 | 0.0 | Site 1, Estimated public dose is calculated in... |
| 4 | 2021 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.072 | 0.0 | Site 1, Estimated public dose is calculated in... |
| 93 | 2017 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.153 | 0.0 | Site 2, Estimated public dose is calculated in... |
| 71 | 2018 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.173 | 0.0 | Site 2, Estimated public dose is calculated in... |
| 49 | 2019 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.127 | 0.0 | Site 2, Estimated public dose is calculated in... |
| 27 | 2020 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.117 | 0.0 | Site 2, Estimated public dose is calculated in... |
| 5 | 2021 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.086 | 0.0 | Site 2, Estimated public dose is calculated in... |
(5) Why are Port Hope Conversion Facility's Estimated Public Dose values combined from 2013 to 2016, but separated between Site 1 & Site 2 from 2017 to 2021?
df_npf.drop(columns=['NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude', 'Substance Name (French) | Nom de substance (Français)', 'Footnotes'], inplace=True)
df_npf.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 0 | 2021 | Blind River Refinery | Uranium | kg | 3.200 | 2.2 |
| 1 | 2021 | Blind River Refinery | Radium-226 | MBq | 0.000 | 2.2 |
| 2 | 2021 | Blind River Refinery | Estimated public dose (see footnote) | mSv/a | 0.009 | 0.0 |
| 3 | 2021 | Port Hope Conversion Facility | Uranium | kg | 39.000 | 0.0 |
| 4 | 2021 | Port Hope Conversion Facility | Estimated public dose (see footnote) | mSv/a | 0.072 | 0.0 |
# Aggregating Port Hope Site 1 & 2 Estimated Public Dose to be able to have a same value accross the years to plot:
df_npf = df_npf.groupby(['Year', 'Facility Name', 'Substance Name (English)', 'Units'],as_index=False).agg({'Stack Emissions': 'sum', 'Direct Discharge': 'sum'})
df_npf[(df_npf['Facility Name'] == 'Port Hope Conversion Facility') & (df_npf['Substance Name (English)'] == 'Estimated public dose (see footnote)')]
df_npf.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 0 | 2013 | BWXT - Peterborough | Estimated public dose (see footnote) | mSv/a | 0.000000 | 0.0 |
| 1 | 2013 | BWXT - Peterborough | Uranium | kg | 0.000013 | 0.0 |
| 2 | 2013 | BWXT - Toronto | Estimated public dose (see footnote) | mSv/a | 0.000600 | 0.0 |
| 3 | 2013 | BWXT - Toronto | Uranium | kg | 0.010400 | 0.0 |
| 4 | 2013 | Blind River Refinery | Estimated public dose (see footnote) | mSv/a | 0.012000 | 0.0 |
# I'm saving the clean dataframe to do a dashboard in Tableau.
df_npf.to_csv(".\Datasets\df_npf.csv", index=True, header=True)
df_npf['Substance Name (English)'].unique()
array(['Estimated public dose (see footnote)', 'Uranium', 'Radium-226',
'Cobalt-60', 'Iodine-125', 'Iodine-131', 'Xenon-133', 'Xenon-135',
'Xenon-135m', 'Elemental Tritium (HT)', 'Tritium (HTO)'],
dtype=object)
df_npf_epd = df_npf[df_npf['Substance Name (English)'] == 'Estimated public dose (see footnote)']
df_npf_epd.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 0 | 2013 | BWXT - Peterborough | Estimated public dose (see footnote) | mSv/a | 0.0000 | 0.0 |
| 2 | 2013 | BWXT - Toronto | Estimated public dose (see footnote) | mSv/a | 0.0006 | 0.0 |
| 4 | 2013 | Blind River Refinery | Estimated public dose (see footnote) | mSv/a | 0.0120 | 0.0 |
| 7 | 2013 | Cameco Fuel Manufacturing | Estimated public dose (see footnote) | mSv/a | 0.0130 | 0.0 |
| 10 | 2013 | Nordion - Ottawa | Estimated public dose (see footnote) | mSv/a | 0.0195 | 0.0 |
# Estimated public dose is calculated incorporating all major release pathways (emissions and discharges).
plt.figure(figsize=(16,6))
year = df_npf_epd['Year'].unique()
for facility in df_npf_epd['Facility Name'].unique():
plt.plot(year, df_npf_epd[df_npf_epd['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Estimated public dose [mSv/a]', size=12)
plt.legend(df_npf_epd['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(6) Why does Port Hope have a spike in 2017 to 2021 with peaks in 2018 & 2020 (Site 1 & Site 2 produce comparable amounts to contribute to this peak)?
(7) Why is there a peak in Cameco Fuel Manufacturing in 2021?
df_npf_uranium = df_npf[df_npf['Substance Name (English)'] == 'Uranium']
df_npf_uranium.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 1 | 2013 | BWXT - Peterborough | Uranium | kg | 0.000013 | 0.0 |
| 3 | 2013 | BWXT - Toronto | Uranium | kg | 0.010400 | 0.0 |
| 6 | 2013 | Blind River Refinery | Uranium | kg | 4.100000 | 3.6 |
| 8 | 2013 | Cameco Fuel Manufacturing | Uranium | kg | 0.510000 | 0.0 |
| 17 | 2013 | Port Hope Conversion Facility | Uranium | kg | 68.400000 | 0.0 |
plt.figure(figsize=(16,6))
year = df_npf_uranium['Year'].unique()
for facility in df_npf_uranium['Facility Name'].unique():
plt.plot(year, df_npf_uranium[df_npf_uranium['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Stack Emissions [kg]', size=12)
plt.legend(df_npf_uranium['Facility Name'].unique(), loc='upper center')
plt.grid()
plt.show()
(8) Why does Port Hope produce so much more than the rest, & with peaks in 2013 & 2019?
# Plotting without Port Hope to take a look at the rest:
df_npf_uranium2 = df_npf_uranium[df_npf_uranium['Facility Name'] != 'Port Hope Conversion Facility']
plt.figure(figsize=(16,6))
year = df_npf_uranium2['Year'].unique()
for facility in df_npf_uranium2['Facility Name'].unique():
plt.plot(year, df_npf_uranium2[df_npf_uranium2['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Stack Emissions [kg]', size=12)
plt.legend(df_npf_uranium2['Facility Name'].unique(), loc='upper center')
plt.grid()
plt.show()
(9) Why does Blind River Refinery have peaks in 2013 & 2020?
plt.figure(figsize=(16,6))
year = df_npf_uranium['Year'].unique()
for facility in df_npf_uranium['Facility Name'].unique():
plt.plot(year, df_npf_uranium[df_npf_uranium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Direct Discharge [kg]', size=12)
plt.legend(df_npf_uranium['Facility Name'].unique(), loc='upper center')
plt.grid()
plt.show()
(10) Why does Blind River Refinery have a peak in 2013/2014?
All the other substances are individual to each location:
Summary of substances per location:
df_npf_radium = df_npf[df_npf['Substance Name (English)'] == 'Radium-226']
df_npf_radium.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 5 | 2013 | Blind River Refinery | Radium-226 | MBq | 0.0 | 1.93 |
| 26 | 2014 | Blind River Refinery | Radium-226 | MBq | 0.0 | 1.81 |
| 47 | 2015 | Blind River Refinery | Radium-226 | MBq | 0.0 | 1.06 |
| 68 | 2016 | Blind River Refinery | Radium-226 | MBq | 0.0 | 0.92 |
| 89 | 2017 | Blind River Refinery | Radium-226 | MBq | 0.0 | 1.04 |
# No Stack Emissions reported (addressed in question (2)).
plt.figure(figsize=(16,6))
plt.plot(df_npf_radium['Year'], df_npf_radium['Direct Discharge'])
plt.xticks(df_npf_radium['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Radium-226 - Direct Discharge [MBq]', size=12)
plt.legend(df_npf_radium['Facility Name'], loc='upper left')
plt.grid()
plt.show()
(11) Why does Blind River Refinery have peaks in 2013, 2019, & 2021?
df_npf_HT = df_npf[df_npf['Substance Name (English)'] == 'Elemental Tritium (HT)']
df_npf_HT.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 18 | 2013 | SRBT | Elemental Tritium (HT) | GBq | 61100.0 | 0.0 |
| 39 | 2014 | SRBT | Elemental Tritium (HT) | GBq | 54800.0 | 0.0 |
| 60 | 2015 | SRBT | Elemental Tritium (HT) | GBq | 44700.0 | 0.0 |
| 81 | 2016 | SRBT | Elemental Tritium (HT) | GBq | 22700.0 | 0.0 |
| 102 | 2017 | SRBT | Elemental Tritium (HT) | GBq | 17600.0 | 0.0 |
# No Direct Discharge reported (addressed in question (3)).
plt.figure(figsize=(16,6))
plt.plot(df_npf_HT['Year'], df_npf_HT['Stack Emissions'])
plt.xticks(df_npf_HT['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Elemental Tritium (HT) - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_HT['Facility Name'], loc='upper right')
plt.grid()
plt.show()
(12) Why does SRBT have a spike in 2013 to 2015 with a peak in 2013?
df_npf_HTO = df_npf[df_npf['Substance Name (English)'] == 'Tritium (HTO)']
df_npf_HTO.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 20 | 2013 | SRBT | Tritium (HTO) | GBq | 17800.0 | 0.0 |
| 41 | 2014 | SRBT | Tritium (HTO) | GBq | 10700.0 | 0.0 |
| 62 | 2015 | SRBT | Tritium (HTO) | GBq | 11500.0 | 0.0 |
| 83 | 2016 | SRBT | Tritium (HTO) | GBq | 6290.0 | 0.0 |
| 104 | 2017 | SRBT | Tritium (HTO) | GBq | 7200.0 | 0.0 |
# No Direct Discharge reported (addressed in question (3)).
plt.figure(figsize=(16,6))
plt.plot(df_npf_HTO['Year'], df_npf_HTO['Stack Emissions'])
plt.xticks(df_npf_HTO['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_HTO['Facility Name'], loc='upper right')
plt.grid()
plt.show()
(13) Why does SRBT have a peak in 2013?
df_npf_cobalt = df_npf[df_npf['Substance Name (English)'] == 'Cobalt-60']
df_npf_cobalt.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 9 | 2013 | Nordion - Ottawa | Cobalt-60 | GBq | 0.0050 | 0.0 |
| 30 | 2014 | Nordion - Ottawa | Cobalt-60 | GBq | 0.0050 | 0.0 |
| 51 | 2015 | Nordion - Ottawa | Cobalt-60 | GBq | 0.0050 | 0.0 |
| 72 | 2016 | Nordion - Ottawa | Cobalt-60 | GBq | 0.0060 | 0.0 |
| 93 | 2017 | Nordion - Ottawa | Cobalt-60 | GBq | 0.0034 | 0.0 |
# No Direct Discharge reported (addressed in question (3)). Zero value for 2020 addressed in question (4).
plt.figure(figsize=(16,6))
plt.plot(df_npf_cobalt['Year'], df_npf_cobalt['Stack Emissions'])
plt.xticks(df_npf_cobalt['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Cobalt-60 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_cobalt['Facility Name'], loc='upper right')
plt.grid()
plt.show()
(14) Why does Nordion-Ottawa have a spike in 2013 to 2018 with a peak in 2016?
df_npf_i125 = df_npf[df_npf['Substance Name (English)'] == 'Iodine-125']
df_npf_i125.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 11 | 2013 | Nordion - Ottawa | Iodine-125 | GBq | 0.2300 | 0.0 |
| 32 | 2014 | Nordion - Ottawa | Iodine-125 | GBq | 0.1400 | 0.0 |
| 53 | 2015 | Nordion - Ottawa | Iodine-125 | GBq | 0.1200 | 0.0 |
| 74 | 2016 | Nordion - Ottawa | Iodine-125 | GBq | 0.2100 | 0.0 |
| 95 | 2017 | Nordion - Ottawa | Iodine-125 | GBq | 0.0012 | 0.0 |
# No Direct Discharge reported (addressed in question (3)). Zero value for 2018 to 2021 addressed in question (4).
plt.figure(figsize=(16,6))
plt.plot(df_npf_i125['Year'], df_npf_i125['Stack Emissions'])
plt.xticks(df_npf_i125['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-125 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_i125['Facility Name'], loc='upper right')
plt.grid()
plt.show()
(15) Why does Nordion-Ottawa have a spike in 2013 to 2016 with peaks in 2013 & 2016?
df_npf_i131 = df_npf[df_npf['Substance Name (English)'] == 'Iodine-131']
df_npf_i131.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 12 | 2013 | Nordion - Ottawa | Iodine-131 | GBq | 0.3900 | 0.0 |
| 33 | 2014 | Nordion - Ottawa | Iodine-131 | GBq | 0.4600 | 0.0 |
| 54 | 2015 | Nordion - Ottawa | Iodine-131 | GBq | 0.1500 | 0.0 |
| 75 | 2016 | Nordion - Ottawa | Iodine-131 | GBq | 0.3500 | 0.0 |
| 96 | 2017 | Nordion - Ottawa | Iodine-131 | GBq | 0.0008 | 0.0 |
# No Direct Discharge reported (addressed in question (3)). Zero value for 2019 to 2021 addressed in question (4).
plt.figure(figsize=(16,6))
plt.plot(df_npf_i131['Year'], df_npf_i131['Stack Emissions'])
plt.xticks(df_npf_i131['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-131 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_i131['Facility Name'], loc='upper right')
plt.grid()
plt.show()
(16) Why does Nordion-Ottawa have a spike in 2013 to 2016 with peaks in 2014 & 2016? & decreases so much after 2017?
df_npf_x133 = df_npf[df_npf['Substance Name (English)'] == 'Xenon-133']
df_npf_x133.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 13 | 2013 | Nordion - Ottawa | Xenon-133 | GBq | 30700.0 | 0.0 |
| 34 | 2014 | Nordion - Ottawa | Xenon-133 | GBq | 15000.0 | 0.0 |
| 55 | 2015 | Nordion - Ottawa | Xenon-133 | GBq | 11900.0 | 0.0 |
| 76 | 2016 | Nordion - Ottawa | Xenon-133 | GBq | 7280.0 | 0.0 |
| 97 | 2017 | Nordion - Ottawa | Xenon-133 | GBq | 0.0 | 0.0 |
# No Direct Discharge reported (addressed in question (3)). Zero value for 2017 to 2021 addressed in question (4).
plt.figure(figsize=(16,6))
plt.plot(df_npf_x133['Year'], df_npf_x133['Stack Emissions'])
plt.xticks(df_npf_x133['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Xenon-133 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_x133['Facility Name'], loc='upper right')
plt.grid()
plt.show()
(17) Why does Nordion-Ottawa have a spike in 2013 to 2016 with a peak in 2013? & decreases so much after 2017?
df_npf_x135 = df_npf[df_npf['Substance Name (English)'] == 'Xenon-135']
df_npf_x135.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 14 | 2013 | Nordion - Ottawa | Xenon-135 | GBq | 28200.0 | 0.0 |
| 35 | 2014 | Nordion - Ottawa | Xenon-135 | GBq | 13100.0 | 0.0 |
| 56 | 2015 | Nordion - Ottawa | Xenon-135 | GBq | 8240.0 | 0.0 |
| 77 | 2016 | Nordion - Ottawa | Xenon-135 | GBq | 4300.0 | 0.0 |
| 98 | 2017 | Nordion - Ottawa | Xenon-135 | GBq | 0.0 | 0.0 |
# No Direct Discharge reported (addressed in question (3)). Zero value for 2017 to 2021 addressed in question (4).
plt.figure(figsize=(16,6))
plt.plot(df_npf_x135['Year'], df_npf_x135['Stack Emissions'])
plt.xticks(df_npf_x135['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Xenon-135 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_x135['Facility Name'], loc='upper right')
plt.grid()
plt.show()
(18) Why does Nordion-Ottawa have a spike in 2013 to 2016 with a peak in 2013? & decreases so much after 2017?
df_npf_x135m = df_npf[df_npf['Substance Name (English)'] == 'Xenon-135m']
df_npf_x135m.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 15 | 2013 | Nordion - Ottawa | Xenon-135m | GBq | 43400.0 | 0.0 |
| 36 | 2014 | Nordion - Ottawa | Xenon-135m | GBq | 18200.0 | 0.0 |
| 57 | 2015 | Nordion - Ottawa | Xenon-135m | GBq | 10800.0 | 0.0 |
| 78 | 2016 | Nordion - Ottawa | Xenon-135m | GBq | 5420.0 | 0.0 |
| 99 | 2017 | Nordion - Ottawa | Xenon-135m | GBq | 0.0 | 0.0 |
# No Direct Discharge reported (addressed in question (3)). Zero value for 2017 to 2021 addressed in question (4).
plt.figure(figsize=(16,6))
plt.plot(df_npf_x135m['Year'], df_npf_x135m['Stack Emissions'])
plt.xticks(df_npf_x135m['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Xenon-135m - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_x135m['Facility Name'], loc='upper right')
plt.grid()
plt.show()
(19) Why does Nordion-Ottawa have a spike in 2013 to 2016 with a peak in 2013? & decreases so much after 2017?
facilities = df_npf['Facility Name'].unique()
for f in facilities:
df = df_npf[df_npf['Facility Name'] == f]
print(f,'\n')
subs = df['Substance Name (English)'].unique()
for s in subs:
df2 = df[df['Substance Name (English)'] == s]
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,8))
fig.subplots_adjust(hspace=0.5)
ax1.plot(df2['Year'], df2['Stack Emissions'], color='green')
ax1.set_title(s + ' - Stack Emissions', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax1.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax1.grid()
ax2.plot(df2['Year'], df2['Direct Discharge'], color='red')
ax2.set_title(s + ' - Direct Discharge', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax2.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax2.grid()
plt.show()
BWXT - Peterborough
BWXT - Toronto
Blind River Refinery
Cameco Fuel Manufacturing
Nordion - Ottawa
Port Hope Conversion Facility
SRBT
df_npf_2020 = pd.read_csv("./Datasets/2020/Nuclear Processing Facilities.csv", encoding='latin1')
df_npf_2020.head()
| Year | Année | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Région économique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Français) | Units | Unités | Stack Emissions | Émissions de cheminées | Direct Discharge | Évacuations directes | Footnotes | Notes de bas de page | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Uranium | Uranium | kg | 4.8 | 2.8 | NaN |
| 1 | 2020 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Radium-226 | Radium-226 | MBq | NRM | NRS | 1.4 | NaN |
| 2 | 2020 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.009 | NRM | NRS | Estimated public dose is calculated incorporat... |
| 3 | 2020 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Uranium | Uranium | kg | 44.4 | NRM | NRS | NaN |
| 4 | 2020 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.129 | NRM | NRS | Site 1, Estimated public dose is calculated in... |
# I will remove 2021 from the new dataframe and compare the remaining with 2020.
df_npf_2021 = df_npf_0[df_npf_0['Year | Année'] != 2021].reset_index(drop = True)
df_npf_2021.head()
| Year | Année | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Région économique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Français) | Units | Unités | Stack Emissions | Émissions de cheminées | Direct Discharge | Évacuations directes | Footnotes | Notes de bas de page | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Uranium | Uranium | kg | 4.8 | 2.8 | NaN |
| 1 | 2020 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Radium-226 | Radium-226 | MBq | NRM | NRS | 1.4 | NaN |
| 2 | 2020 | 3657.0 | Cameco | Blind River Refinery | Blind River | NaN | NaN | NaN | ON | 46.1814 | -83.0177 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.009 | NRM | NRS | Estimated public dose is calculated incorporat... |
| 3 | 2020 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Uranium | Uranium | kg | 44.4 | NRM | NRS | NaN |
| 4 | 2020 | 1145.0 | Cameco | Port Hope Conversion Facility | Port Hope | NaN | NaN | NaN | ON | 43.9437 | -78.2954 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.129 | NRM | NRS | Site 1, Estimated public dose is calculated in... |
# I will concatenate both dataframes & keep the not duplicates to see the differences. This produces a df with 2021's values first, and 2020's values after:
df = pd.concat([df_npf_2021,df_npf_2020]).drop_duplicates(keep=False)
df
# 2 changes only.
| Year | Année | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Région économique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Français) | Units | Unités | Stack Emissions | Émissions de cheminées | Direct Discharge | Évacuations directes | Footnotes | Notes de bas de page | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 50 | 2018 | NaN | Cameco | Cameco Fuel Manufacturing | Port Hope | NaN | NaN | NaN | ON | 43.954 | -78.2748 | Uranium | Uranium | kg | 1.25 | NRM | NRS | NaN |
| 72 | 2017 | NaN | Cameco | Cameco Fuel Manufacturing | Port Hope | NaN | NaN | NaN | ON | 43.954 | -78.2748 | Uranium | Uranium | kg | 0.57 | NRM | NRS | NaN |
| 50 | 2018 | NaN | Cameco | Cameco Fuel Manufacturing | Port Hope | NaN | NaN | NaN | ON | 43.954 | -78.2748 | Uranium | Uranium | kg | 1.26 | NRM | NRS | NaN |
| 72 | 2017 | NaN | Cameco | Cameco Fuel Manufacturing | Port Hope | NaN | NaN | NaN | ON | 43.954 | -78.2748 | Uranium | Uranium | kg | 0.58 | NRM | NRS | NaN |
(20) Why did this 2 set of values changed between reports? Why wasn't it addressed somewhere?
df_cnl = pd.read_csv("./Datasets/Canadian Nuclear Laboratories.csv")
df_cnl
| Year | Année | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Région économique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Français) | Units | Unités | Stack Emissions | Émissions de cheminées | Direct Discharge | Évacuations directes | Footnotes | Notes de bas de page | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Elemental Tritium (HT) | Tritium élémentaire | Bq | 2.08E+12 | NRM | NRS | NaN |
| 1 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Tritium (HTO) | Tritium (Eau tritiée) | Bq | 2.49E+13 | 1.50E+13 | NaN |
| 2 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Carbon-14 | Carbone-14 | Bq | 0.00E+00 | NRM | NRS | NaN |
| 3 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Total noble gases | Total des gaz nobles | Bq-MeV | 0.00E+00 | NRM | NRS | NaN |
| 4 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Iodine-125 | Iode-125 | Bq | 1.76E+06 | NRM | NRS | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 283 | 2013 | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Douglas Point | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3267 | -81.6000 | Tritium (HTO) | Tritium (Eau tritiée) | Bq | 1.59E+11 | 8.73E+10 | NaN |
| 284 | 2013 | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Douglas Point | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3267 | -81.6000 | Particulate gross beta | Particules bêta brutes | Bq | NRM | NRS | 5.31E+07 | NaN |
| 285 | 2013 | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Douglas Point | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3267 | -81.6000 | Estimated public dose (see footnote) | Dose estimée au public (voir note de bas de page) | mSv/a | 0.0013 | NRM | NRS | Includes the entire Bruce site. Estimated publ... |
| 286 | 2013 | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Nuclear Power Demonstration | Rolphton | NaN | NaN | NaN | ON | 46.1868 | -77.6578 | Tritium (HTO) | Tritium (Eau tritiée) | Bq | 6.86E+10 | 1.41E+11 | NaN |
| 287 | 2013 | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Nuclear Power Demonstration | Rolphton | NaN | NaN | NaN | ON | 46.1868 | -77.6578 | Particulate gross beta | Particules bêta brutes | Bq | 6.63E+04 | 9.76E+05 | NaN |
288 rows × 17 columns
(1) Why does the Data start in 2013? Can we get older data?
# I'm creating a copy because I will need it later.
df_cnl_0 = df_cnl.copy()
df_cnl["Facility Name | Nom de l'installation"].unique()
array(['Chalk River Laboratories', 'Whiteshell Laboratories',
'Port Granby Project', 'Port Hope Project', 'Douglas Point',
'Nuclear Power Demonstration'], dtype=object)
# Renaming columns to English only:
df_cnl.rename(columns={'Year | Année': 'Year', 'NPRI ID | ID INRP': 'NPRI ID','Company Name | Raison Sociale':'Company Name',"Facility Name | Nom de l'installation":'Facility Name', 'City | Ville':'City', 'CSD | SDR':'CSD','CA or CMA | AR ou RMR':'CA or CMA', 'Economic Region | Région économique':'Economic Region','Province | Province':'Province', 'Latitude | Latitude':'Latitude', 'Longitude | Longitude':'Longitude','Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)','Units | Unités':'Units', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge','Footnotes | Notes de bas de page':'Footnotes'}, inplace = True)
df_cnl.head()
| Year | NPRI ID | Company Name | Facility Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | Substance Name (English) | Substance Name (French) | Nom de substance (Français) | Units | Stack Emissions | Direct Discharge | Footnotes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Elemental Tritium (HT) | Tritium élémentaire | Bq | 2.08E+12 | NRM | NRS | NaN |
| 1 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Tritium (HTO) | Tritium (Eau tritiée) | Bq | 2.49E+13 | 1.50E+13 | NaN |
| 2 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Carbon-14 | Carbone-14 | Bq | 0.00E+00 | NRM | NRS | NaN |
| 3 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Total noble gases | Total des gaz nobles | Bq-MeV | 0.00E+00 | NRM | NRS | NaN |
| 4 | 2021 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Iodine-125 | Iode-125 | Bq | 1.76E+06 | NRM | NRS | NaN |
I noticed some values are expressed as "NRM (Not Required to Monitor)". I will summarize which values are given like that, before replacing them with zeros to be able to plot.
# Stack Emission column first:
df_cnl_miss_stack = df_cnl[df_cnl['Stack Emissions'] == 'NRM | NRS']
df_cnl_miss_stack[['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
| Year | Facility Name | Substance Name (English) | Stack Emissions | |
|---|---|---|---|---|
| 270 | 2013 | Chalk River Laboratories | Particulate gross alpha | NRM | NRS |
| 240 | 2014 | Chalk River Laboratories | Particulate gross alpha | NRM | NRS |
| 210 | 2015 | Chalk River Laboratories | Particulate gross alpha | NRM | NRS |
| 179 | 2016 | Chalk River Laboratories | Particulate gross alpha | NRM | NRS |
| 142 | 2017 | Chalk River Laboratories | Particulate gross alpha | NRM | NRS |
| ... | ... | ... | ... | ... |
| 14 | 2021 | Whiteshell Laboratories | Strontium-90 | NRM | NRS |
| 149 | 2017 | Whiteshell Laboratories | Uranium-total | NRM | NRS |
| 112 | 2018 | Whiteshell Laboratories | Uranium-total | NRM | NRS |
| 78 | 2019 | Whiteshell Laboratories | Uranium-total | NRM | NRS |
| 44 | 2020 | Whiteshell Laboratories | Uranium-total | NRM | NRS |
104 rows × 4 columns
(2) Summary of Missing Data (NRM) for Stack Emissions:
Note: Whiteshell Laboratories has a "Uranium-total" substance. I originally thought of combining it with "Uranium", but it has different units, so I didn't.
# Direct Discharge column next:
df_cnl_miss_discharge = df_cnl[df_cnl['Direct Discharge'] == 'NRM | NRS']
df_cnl_miss_discharge[['Year','Facility Name', 'Substance Name (English)', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
| Year | Facility Name | Substance Name (English) | Direct Discharge | |
|---|---|---|---|---|
| 268 | 2013 | Chalk River Laboratories | Argon-41 | NRM | NRS |
| 238 | 2014 | Chalk River Laboratories | Argon-41 | NRM | NRS |
| 208 | 2015 | Chalk River Laboratories | Argon-41 | NRM | NRS |
| 177 | 2016 | Chalk River Laboratories | Argon-41 | NRM | NRS |
| 141 | 2017 | Chalk River Laboratories | Argon-41 | NRM | NRS |
| ... | ... | ... | ... | ... |
| 214 | 2015 | Whiteshell Laboratories | Tritium (HTO) | NRM | NRS |
| 183 | 2016 | Whiteshell Laboratories | Tritium (HTO) | NRM | NRS |
| 146 | 2017 | Whiteshell Laboratories | Tritium (HTO) | NRM | NRS |
| 109 | 2018 | Whiteshell Laboratories | Tritium (HTO) | NRM | NRS |
| 11 | 2021 | Whiteshell Laboratories | Tritium (HTO) | NRM | NRS |
116 rows × 4 columns
(3) Summary of Missing Data (NRM) for Direct Discharge:
# I noticed some values are "0.00E+00".
df_cnl[df_cnl['Stack Emissions'] == '0.00E+00'][['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 70 | 2019 | Chalk River Laboratories | Argon-41 | 0.00E+00 | NRM | NRS |
| 36 | 2020 | Chalk River Laboratories | Argon-41 | 0.00E+00 | NRM | NRS |
| 6 | 2021 | Chalk River Laboratories | Argon-41 | 0.00E+00 | NRM | NRS |
| 2 | 2021 | Chalk River Laboratories | Carbon-14 | 0.00E+00 | NRM | NRS |
| 67 | 2019 | Chalk River Laboratories | Total noble gases | 0.00E+00 | NRM | NRS |
| 33 | 2020 | Chalk River Laboratories | Total noble gases | 0.00E+00 | NRM | NRS |
| 3 | 2021 | Chalk River Laboratories | Total noble gases | 0.00E+00 | NRM | NRS |
| 24 | 2021 | Douglas Point | Particulate gross alpha | 0.00E+00 | 5.55E+06 |
(4) Summary of Zero Values (all Stack Emissions):
df_cnl_geography = df_cnl[['Facility Name', 'NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude']]
df_cnl_geography
| Facility Name | NPRI ID | Company Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Chalk River Laboratories | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 |
| 1 | Chalk River Laboratories | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 |
| 2 | Chalk River Laboratories | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 |
| 3 | Chalk River Laboratories | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 |
| 4 | Chalk River Laboratories | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 283 | Douglas Point | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3267 | -81.6000 |
| 284 | Douglas Point | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3267 | -81.6000 |
| 285 | Douglas Point | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3267 | -81.6000 |
| 286 | Nuclear Power Demonstration | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Rolphton | NaN | NaN | NaN | ON | 46.1868 | -77.6578 |
| 287 | Nuclear Power Demonstration | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Rolphton | NaN | NaN | NaN | ON | 46.1868 | -77.6578 |
288 rows × 10 columns
# Cleaning the geography dataframe:
df_cnl_geography.drop_duplicates(inplace=True)
df_cnl_geography = df_cnl_geography.reset_index(drop=True)
df_cnl_geography
| Facility Name | NPRI ID | Company Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Chalk River Laboratories | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 |
| 1 | Whiteshell Laboratories | 7434.0 | Canadian Nuclear Laboratories / Laboratoires N... | Pinawa | Pinawa | Division No. 1 | Southeast / Sud-est | MB | 50.1789 | -96.0604 |
| 2 | Port Granby Project | 30760.0 | Canadian Nuclear Laboratories / Laboratoires N... | Clarington | NaN | NaN | NaN | ON | 43.9106 | -78.4511 |
| 3 | Port Hope Project | 30761.0 | Canadian Nuclear Laboratories / Laboratoires N... | Port Hope | NaN | NaN | NaN | ON | 43.9608 | -78.3407 |
| 4 | Douglas Point | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Tiverton | Kincardine | NaN | Stratford--Bruce Peninsula | ON | 44.3267 | -81.6000 |
| 5 | Nuclear Power Demonstration | NaN | Canadian Nuclear Laboratories / Laboratoires N... | Rolphton | NaN | NaN | NaN | ON | 46.1868 | -77.6578 |
# Cleaning the NRM (Not required to monitor) values so I can convert the columns into numeric for later plotting:
df_cnl['Stack Emissions'].replace('NRM | NRS', 0, inplace=True)
df_cnl['Direct Discharge'].replace('NRM | NRS', 0, inplace=True)
# Replacing the "<0.01" to "0.01" to be able to transform into numeric, and correcting the "3.08+10" to "3.08E+10" & "4.43+07" to "4.43E+07" (rows 29 & 30).
df_cnl['Stack Emissions'].replace('<0.01', 0.01, inplace=True)
df_cnl['Direct Discharge'].replace('3.08+10', 3.08E+10, inplace=True)
df_cnl['Direct Discharge'].replace('4.43+07', 4.43E+07, inplace=True)
# Converted columns to numeric for plotting:
df_cnl['Stack Emissions'] = pd.to_numeric(df_cnl['Stack Emissions'])
df_cnl['Direct Discharge'] = pd.to_numeric(df_cnl['Direct Discharge'])
# I noticed "Port Hope Project" has 4 extra values in 2017 & 2018 for Radium-226 & Uranium:
df_cnl[(df_cnl['Year'].isin([2017, 2018])) & (df_cnl['Substance Name (English)'].isin(['Radium-226', 'Uranium'])) & (df_cnl['Facility Name'] == 'Port Hope Project')].sort_values(['Substance Name (English)'])[['Year', 'Facility Name', 'Substance Name (English)', 'Units', 'Stack Emissions', 'Direct Discharge', 'Footnotes']]
# Note that the 0.0 values were "NRM" before I changed them.
# Footnote explains: Releases from non-routine operations.
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | Footnotes | |
|---|---|---|---|---|---|---|---|
| 122 | 2018 | Port Hope Project | Radium-226 | Bq | 0.0 | 7.000000e+05 | NaN |
| 124 | 2018 | Port Hope Project | Radium-226 | Bq | 0.0 | 5.680000e+09 | Releases from non-routine operations | Rejets ... |
| 159 | 2017 | Port Hope Project | Radium-226 | Bq | 0.0 | 8.000000e+05 | NaN |
| 161 | 2017 | Port Hope Project | Radium-226 | Bq | 0.0 | 1.590000e+10 | Releases from non-routine operations | Rejets ... |
| 123 | 2018 | Port Hope Project | Uranium | kg | 0.0 | 5.000000e-01 | NaN |
| 125 | 2018 | Port Hope Project | Uranium | kg | 0.0 | 1.460000e+01 | Releases from non-routine operations | Rejets ... |
| 160 | 2017 | Port Hope Project | Uranium | kg | 0.0 | 1.000000e-01 | NaN |
| 162 | 2017 | Port Hope Project | Uranium | kg | 0.0 | 1.101000e+02 | Releases from non-routine operations | Rejets ... |
(5) Why the extra Direct Discharge in Uranium & Radium-226 in 2017 & 2018?
df_cnl.drop(columns=['NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude', 'Substance Name (French) | Nom de substance (Français)', 'Footnotes'], inplace=True)
df_cnl.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 0 | 2021 | Chalk River Laboratories | Elemental Tritium (HT) | Bq | 2.080000e+12 | 0.000000e+00 |
| 1 | 2021 | Chalk River Laboratories | Tritium (HTO) | Bq | 2.490000e+13 | 1.500000e+13 |
| 2 | 2021 | Chalk River Laboratories | Carbon-14 | Bq | 0.000000e+00 | 0.000000e+00 |
| 3 | 2021 | Chalk River Laboratories | Total noble gases | Bq-MeV | 0.000000e+00 | 0.000000e+00 |
| 4 | 2021 | Chalk River Laboratories | Iodine-125 | Bq | 1.760000e+06 | 0.000000e+00 |
# I'm going to aggregate the extra Port Hope values, to be able to plot:
df_cnl = df_cnl.groupby(['Year', 'Facility Name', 'Substance Name (English)', 'Units'],as_index=False).agg({'Stack Emissions': 'sum', 'Direct Discharge': 'sum'})
df_cnl[(df_cnl['Year'].isin([2017, 2018])) & (df_cnl['Substance Name (English)'].isin(['Radium-226', 'Uranium'])) & (df_cnl['Facility Name'] == 'Port Hope Project')]
df_cnl.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 0 | 2013 | Chalk River Laboratories | Argon-41 | Bq | 8.460000e+15 | 0.0 |
| 1 | 2013 | Chalk River Laboratories | Carbon-14 | Bq | 5.740000e+11 | 0.0 |
| 2 | 2013 | Chalk River Laboratories | Elemental Tritium (HT) | Bq | 1.590000e+12 | 0.0 |
| 3 | 2013 | Chalk River Laboratories | Estimated public dose (see footnote) | mSv/a | 5.914000e-02 | 0.0 |
| 4 | 2013 | Chalk River Laboratories | Iodine-125 | Bq | 2.840000e+08 | 0.0 |
# I'm saving the clean dataframe to do a dashboard in Tableau.
df_cnl.to_csv(".\Datasets\df_cnl.csv", index=True, header=True)
df_cnl['Substance Name (English)'].unique()
array(['Argon-41', 'Carbon-14', 'Elemental Tritium (HT)',
'Estimated public dose (see footnote)', 'Iodine-125', 'Iodine-131',
'Particulate gross alpha', 'Particulate gross beta',
'Strontium-90', 'Total noble gases', 'Tritium (HTO)', 'Xenon-133',
'Radium-226', 'Uranium', 'Cesium-137', 'Americium-241',
'Plutonium-238', 'Plutonium-239/240', 'Uranium-total'],
dtype=object)
df_cnl_c14 = df_cnl[df_cnl['Substance Name (English)'] == 'Carbon-14']
df_cnl_c14
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 1 | 2013 | Chalk River Laboratories | Carbon-14 | Bq | 5.740000e+11 | 0.0 |
| 27 | 2014 | Chalk River Laboratories | Carbon-14 | Bq | 8.690000e+11 | 0.0 |
| 57 | 2015 | Chalk River Laboratories | Carbon-14 | Bq | 3.770000e+11 | 0.0 |
| 87 | 2016 | Chalk River Laboratories | Carbon-14 | Bq | 4.850000e+11 | 0.0 |
| 118 | 2017 | Chalk River Laboratories | Carbon-14 | Bq | 4.910000e+11 | 0.0 |
| 152 | 2018 | Chalk River Laboratories | Carbon-14 | Bq | 2.590000e+11 | 0.0 |
| 162 | 2018 | Douglas Point | Carbon-14 | Bq | 1.510000e+09 | 0.0 |
| 187 | 2019 | Chalk River Laboratories | Carbon-14 | Bq | 3.440000e+10 | 0.0 |
| 221 | 2020 | Chalk River Laboratories | Carbon-14 | Bq | 2.610000e+10 | 0.0 |
| 255 | 2021 | Chalk River Laboratories | Carbon-14 | Bq | 0.000000e+00 | 0.0 |
(6) Why does Douglas Point only report Carbon-14 Emissions for 2018?
# No Direct Discharge reported (addressed in question (3)). Removing Douglas to be able to plot.
df_cnl_c14 = df_cnl_c14[df_cnl_c14['Facility Name'] != 'Douglas Point']
plt.figure(figsize=(16,6))
year = df_cnl_c14['Year'].unique()
for facility in df_cnl_c14['Facility Name'].unique():
plt.plot(year, df_cnl_c14[df_cnl_c14['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Carbon-14 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_c14['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(7) Why does Chalk River Laboratories have a peak in 2014 & much lower values since 2019 reaching 0 in 2021?
df_cnl_epd = df_cnl[df_cnl['Substance Name (English)'] == 'Estimated public dose (see footnote)']
df_cnl_epd.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 3 | 2013 | Chalk River Laboratories | Estimated public dose (see footnote) | mSv/a | 0.05914 | 0.0 |
| 12 | 2013 | Douglas Point | Estimated public dose (see footnote) | mSv/a | 0.00130 | 0.0 |
| 29 | 2014 | Chalk River Laboratories | Estimated public dose (see footnote) | mSv/a | 0.06000 | 0.0 |
| 38 | 2014 | Douglas Point | Estimated public dose (see footnote) | mSv/a | 0.00200 | 0.0 |
| 41 | 2014 | Nuclear Power Demonstration | Estimated public dose (see footnote) | mSv/a | 0.01000 | 0.0 |
# Whiteshell Laboratories, Port Granby Project, Port Hope Project, Nuclear Power Demonstration are missing 2013. I will have to plot them separately.
# No Direct Discharge reported (addressed in question (3)).
df_cnl_epd2 = df_cnl_epd[df_cnl_epd['Facility Name'].isin(['Chalk River Laboratories', 'Douglas Point'])]
plt.figure(figsize=(16,6))
year = df_cnl_epd2['Year'].unique()
for facility in df_cnl_epd2['Facility Name'].unique():
plt.plot(year, df_cnl_epd2[df_cnl_epd2['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Estimated public dose - Stack Emissions [mSv/a]', size=12)
plt.legend(df_cnl_epd2['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(8) Why does Chalk River Laboratories have a spike in 2015 to 2017 & then plummets?
# No Direct Discharge reported (addressed in question (3)).
df_cnl_epd3 = df_cnl_epd[df_cnl_epd['Facility Name'].isin(['Whiteshell Laboratories', 'Port Granby Project', 'Port Hope Project', 'Nuclear Power Demonstration'])]
plt.figure(figsize=(16,6))
year = df_cnl_epd3['Year'].unique()
for facility in df_cnl_epd3['Facility Name'].unique():
plt.plot(year, df_cnl_epd3[df_cnl_epd3['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Estimated public dose - Stack Emissions [mSv/a]', size=12)
plt.legend(df_cnl_epd3['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(9) Why Nuclear Power Demonstration, Port Granby, Port Hope, & Whiteshell Laboratories don't report anything for 2013?
(10) Why does Port Hope have a peak in 2015 & a spike between 2018 & 2020?
(11) Why does Port Granby have a peak in 2019?
df_cnl_alpha = df_cnl[df_cnl['Substance Name (English)'] == 'Particulate gross alpha']
df_cnl_alpha.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 6 | 2013 | Chalk River Laboratories | Particulate gross alpha | Bq | 0.0 | 46800000.0 |
| 22 | 2013 | Whiteshell Laboratories | Particulate gross alpha | Bq | 92400.0 | 114000000.0 |
| 32 | 2014 | Chalk River Laboratories | Particulate gross alpha | Bq | 0.0 | 907000000.0 |
| 52 | 2014 | Whiteshell Laboratories | Particulate gross alpha | Bq | 88200.0 | 47600000.0 |
| 62 | 2015 | Chalk River Laboratories | Particulate gross alpha | Bq | 0.0 | 694000000.0 |
# Douglas Point starts in 2016. I will have to plot it separately.
# Chalk River Laboratories Stack Emissions are 0, but it was previously 'NRM' (addressed in question (2)).
df_cnl_alpha2 = df_cnl_alpha[df_cnl_alpha['Facility Name'] != 'Douglas Point']
plt.figure(figsize=(16,6))
year = df_cnl_alpha2['Year'].unique()
for facility in df_cnl_alpha2['Facility Name'].unique():
plt.plot(year, df_cnl_alpha2[df_cnl_alpha2['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_alpha2['Facility Name'].unique(), loc='center left')
plt.grid()
plt.show()
df_cnl_alpha3 = df_cnl_alpha[df_cnl_alpha['Facility Name'] == 'Douglas Point']
plt.figure(figsize=(16,6))
year = df_cnl_alpha3['Year'].unique()
for facility in df_cnl_alpha3['Facility Name'].unique():
plt.plot(year, df_cnl_alpha3[df_cnl_alpha3['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_alpha3['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(12) Why Douglas Point doesn't report anything for 2013 to 2015? (both stack & direct)
(13) Why does Douglas Point have a peak in 2020?
plt.figure(figsize=(16,6))
year = df_cnl_alpha2['Year'].unique()
for facility in df_cnl_alpha2['Facility Name'].unique():
plt.plot(year, df_cnl_alpha2[df_cnl_alpha2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_alpha2['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(14) Why is Chalk River so much higher than Whiteshell Laboratories & has a peak in 2014?
plt.figure(figsize=(16,6))
year = df_cnl_alpha3['Year'].unique()
for facility in df_cnl_alpha3['Facility Name'].unique():
plt.plot(year, df_cnl_alpha3[df_cnl_alpha3['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_alpha3['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(15) Why does Douglas Point has a peak in 2017 & 2018?
df_cnl_beta = df_cnl[df_cnl['Substance Name (English)'] == 'Particulate gross beta']
df_cnl_beta.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 7 | 2013 | Chalk River Laboratories | Particulate gross beta | Bq | 0.0 | 3.020000e+09 |
| 13 | 2013 | Douglas Point | Particulate gross beta | Bq | 0.0 | 5.310000e+07 |
| 15 | 2013 | Nuclear Power Demonstration | Particulate gross beta | Bq | 66300.0 | 9.760000e+05 |
| 23 | 2013 | Whiteshell Laboratories | Particulate gross beta | Bq | 229000.0 | 3.860000e+08 |
| 33 | 2014 | Chalk River Laboratories | Particulate gross beta | Bq | 0.0 | 2.620000e+11 |
# Chalk River Laboratories & Douglas Point Stack Emissions are 0, but it was previously 'NRM' (addressed in question (2)).
plt.figure(figsize=(16,6))
year = df_cnl_beta['Year'].unique()
for facility in df_cnl_beta['Facility Name'].unique():
plt.plot(year, df_cnl_beta[df_cnl_beta['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross beta - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_beta['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(16) Why does Whiteshell Laboratories has higher emissions? & with peaks in 2014 & 2019?
(17) Why does Nuclear Power Demonstration has a peak in 2017?
(18) Why does Douglas Point has a peak in 2020?
plt.figure(figsize=(16,6))
year = df_cnl_beta['Year'].unique()
for facility in df_cnl_beta['Facility Name'].unique():
plt.plot(year, df_cnl_beta[df_cnl_beta['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross beta - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_beta['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(19) Why does Chalk River Laboratories has higher discharge? & with a peak in 2014?
# I'm plotting without Chalk River to see the rest.
df_cnl_beta2 = df_cnl_beta[df_cnl_beta['Facility Name'] != 'Chalk River Laboratories']
plt.figure(figsize=(16,6))
year = df_cnl_beta['Year'].unique()
for facility in df_cnl_beta2['Facility Name'].unique():
plt.plot(year, df_cnl_beta2[df_cnl_beta2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross beta - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_beta2['Facility Name'].unique(), loc='best')
plt.grid()
plt.show()
(20) Why does Whiteshell Laboratories has a peak in 2013 & a spike 2019/2020/2021?
(21) Why does Nuclear Power Demonstration has peaks in 2017 & 2020?
df_cnl_radium = df_cnl[df_cnl['Substance Name (English)'] == 'Radium-226']
df_cnl_radium.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 17 | 2013 | Port Granby Project | Radium-226 | Bq | 0.0 | 5000000.0 |
| 19 | 2013 | Port Hope Project | Radium-226 | Bq | 0.0 | 6200000.0 |
| 45 | 2014 | Port Granby Project | Radium-226 | Bq | 0.0 | 5400000.0 |
| 48 | 2014 | Port Hope Project | Radium-226 | Bq | 0.0 | 7700000.0 |
| 75 | 2015 | Port Granby Project | Radium-226 | Bq | 0.0 | 4600000.0 |
# No Stack Emissions reported (addressed in question (2)).
plt.figure(figsize=(16,6))
year = df_cnl_radium['Year'].unique()
for facility in df_cnl_radium['Facility Name'].unique():
plt.plot(year, df_cnl_radium[df_cnl_radium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Radium-226 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_radium['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(22) Why does Port Hope has a huge peak in 2017/2018 (x10,000)? Looking at the data, it comes from the "releases from non-routine operations".
df_cnl_strontium = df_cnl[df_cnl['Substance Name (English)'] == 'Strontium-90']
df_cnl_strontium.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 8 | 2013 | Chalk River Laboratories | Strontium-90 | Bq | 0.0 | 1.510000e+10 |
| 24 | 2013 | Whiteshell Laboratories | Strontium-90 | Bq | 0.0 | 6.970000e+07 |
| 34 | 2014 | Chalk River Laboratories | Strontium-90 | Bq | 0.0 | 2.260000e+11 |
| 54 | 2014 | Whiteshell Laboratories | Strontium-90 | Bq | 0.0 | 6.610000e+07 |
| 64 | 2015 | Chalk River Laboratories | Strontium-90 | Bq | 0.0 | 6.700000e+10 |
# No Stack Emissions reported (addressed in question (2)).
plt.figure(figsize=(16,6))
year = df_cnl_strontium['Year'].unique()
for facility in df_cnl_strontium['Facility Name'].unique():
plt.plot(year, df_cnl_strontium[df_cnl_strontium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Strontium-90 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_strontium['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(23) Why does Chalk River Laboratories has a peak in 2014?
df_cnl_hto = df_cnl[df_cnl['Substance Name (English)'] == 'Tritium (HTO)']
df_cnl_hto.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 10 | 2013 | Chalk River Laboratories | Tritium (HTO) | Bq | 2.460000e+14 | 2.430000e+12 |
| 14 | 2013 | Douglas Point | Tritium (HTO) | Bq | 1.590000e+11 | 8.730000e+10 |
| 16 | 2013 | Nuclear Power Demonstration | Tritium (HTO) | Bq | 6.860000e+10 | 1.410000e+11 |
| 25 | 2013 | Whiteshell Laboratories | Tritium (HTO) | Bq | 3.520000e+10 | 0.000000e+00 |
| 36 | 2014 | Chalk River Laboratories | Tritium (HTO) | Bq | 2.600000e+14 | 3.070000e+13 |
plt.figure(figsize=(16,6))
year = df_cnl_hto['Year'].unique()
for facility in df_cnl_hto['Facility Name'].unique():
plt.plot(year, df_cnl_hto[df_cnl_hto['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_hto['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(24) Why does Chalk River Laboratories has higher emissions? & then comes down in 2020?
# I'm plotting without Chalk River to see the rest.
df_cnl_hto2 = df_cnl_hto[df_cnl_hto['Facility Name'] != 'Chalk River Laboratories']
plt.figure(figsize=(16,6))
year = df_cnl_hto2['Year'].unique()
for facility in df_cnl_hto2['Facility Name'].unique():
plt.plot(year, df_cnl_hto2[df_cnl_hto2['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_hto2['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(25) Why does Nuclear Power Demonstration has a peak in 2017?
(26) Why does Douglas Point has a peak in 2018?
# No Direct Discharge reported for Whiteshell Laboratories (NRM, addressed in question (3)).
plt.figure(figsize=(16,6))
year = df_cnl_hto['Year'].unique()
for facility in df_cnl_hto['Facility Name'].unique():
plt.plot(year, df_cnl_hto[df_cnl_hto['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_hto['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(27) Why does Chalk River Laboratories has higher discharge? With a spike from 2014 to 2017?
# I'm plotting without Chalk River to see the rest.
plt.figure(figsize=(16,6))
year = df_cnl_hto2['Year'].unique()
for facility in df_cnl_hto2['Facility Name'].unique():
plt.plot(year, df_cnl_hto2[df_cnl_hto2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_hto2['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(28) Why does Nuclear Power Demonstration has a spike in 2013 to 2015?
(29) Why does Douglas Point has a peak in 2013?
df_cnl_uranium = df_cnl[df_cnl['Substance Name (English)'] == 'Uranium']
df_cnl_uranium.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 18 | 2013 | Port Granby Project | Uranium | kg | 0.0 | 49.2 |
| 20 | 2013 | Port Hope Project | Uranium | kg | 0.0 | 25.4 |
| 46 | 2014 | Port Granby Project | Uranium | kg | 0.0 | 36.7 |
| 49 | 2014 | Port Hope Project | Uranium | kg | 0.0 | 23.0 |
| 76 | 2015 | Port Granby Project | Uranium | kg | 0.0 | 29.0 |
# No Stack Emissions reported (addressed in question (2)).
plt.figure(figsize=(16,6))
year = df_cnl_uranium['Year'].unique()
for facility in df_cnl_uranium['Facility Name'].unique():
plt.plot(year, df_cnl_uranium[df_cnl_uranium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Direct Discharge [kg]', size=12)
plt.legend(df_cnl_uranium['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(30) Why does Port Granby has a spike from 2013 to 2016?
(31) Why does Port Hope has a peak in 2017? Looking at the data, it comes from the "releases from non-routine operations" & Why does it comes down so much after 2018?
Addressed in question (5).
All the other substances are individual to each location:
Summary of substances per location:
df_cnl_a241 = df_cnl[df_cnl['Substance Name (English)'] == 'Americium-241']
df_cnl_a241.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 141 | 2017 | Whiteshell Laboratories | Americium-241 | Bq | 0.0 | 5100000.0 |
| 176 | 2018 | Whiteshell Laboratories | Americium-241 | Bq | 0.0 | 4210000.0 |
| 210 | 2019 | Whiteshell Laboratories | Americium-241 | Bq | 0.0 | 20100000.0 |
| 244 | 2020 | Whiteshell Laboratories | Americium-241 | Bq | 0.0 | 18000000.0 |
# No Stack Emissions reported (addressed in question (2)). Missing Direct Discharge values from 2013 to 2016.
plt.figure(figsize=(16,6))
year = df_cnl_a241['Year'].unique()
for facility in df_cnl_a241['Facility Name'].unique():
plt.plot(year, df_cnl_a241[df_cnl_a241['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Americium-241 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_a241['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(32) Why wasn't it reported from 2013 to 2016 & 2021?
(33) What happened in 2019 & 2020 that the discharge increased so much?
df_cnl_argon = df_cnl[df_cnl['Substance Name (English)'] == 'Argon-41']
df_cnl_argon.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 0 | 2013 | Chalk River Laboratories | Argon-41 | Bq | 8.460000e+15 | 0.0 |
| 26 | 2014 | Chalk River Laboratories | Argon-41 | Bq | 9.370000e+15 | 0.0 |
| 56 | 2015 | Chalk River Laboratories | Argon-41 | Bq | 1.290000e+16 | 0.0 |
| 86 | 2016 | Chalk River Laboratories | Argon-41 | Bq | 1.070000e+16 | 0.0 |
| 117 | 2017 | Chalk River Laboratories | Argon-41 | Bq | 1.160000e+16 | 0.0 |
# No Direct Discharge reported (addressed in question (3)). Zero values for 2019 - 2021 (not NRM, addressed in question (4))
plt.figure(figsize=(16,6))
year = df_cnl_argon['Year'].unique()
for facility in df_cnl_argon['Facility Name'].unique():
plt.plot(year, df_cnl_argon[df_cnl_argon['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Argon-41 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_argon['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(34) Why does Chalk River report zero from 2019 to 2021?
df_cnl_cesium = df_cnl[df_cnl['Substance Name (English)'] == 'Cesium-137']
df_cnl_cesium.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 21 | 2013 | Whiteshell Laboratories | Cesium-137 | Bq | 0.0 | 64000000.0 |
| 50 | 2014 | Whiteshell Laboratories | Cesium-137 | Bq | 0.0 | 26600000.0 |
| 80 | 2015 | Whiteshell Laboratories | Cesium-137 | Bq | 0.0 | 16500000.0 |
| 111 | 2016 | Whiteshell Laboratories | Cesium-137 | Bq | 0.0 | 12800000.0 |
| 142 | 2017 | Whiteshell Laboratories | Cesium-137 | Bq | 0.0 | 18900000.0 |
# No Stack Emissions reported (addressed in question (2)).
plt.figure(figsize=(16,6))
year = df_cnl_cesium['Year'].unique()
for facility in df_cnl_cesium['Facility Name'].unique():
plt.plot(year, df_cnl_cesium[df_cnl_cesium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Cesium-137 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_cesium['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(35) Why does Whiteshell Laboratories has a peak in 2013?
df_cnl_ht = df_cnl[df_cnl['Substance Name (English)'] == 'Elemental Tritium (HT)']
df_cnl_ht.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 2 | 2013 | Chalk River Laboratories | Elemental Tritium (HT) | Bq | 1.590000e+12 | 0.0 |
| 28 | 2014 | Chalk River Laboratories | Elemental Tritium (HT) | Bq | 1.370000e+12 | 0.0 |
| 58 | 2015 | Chalk River Laboratories | Elemental Tritium (HT) | Bq | 4.770000e+12 | 0.0 |
| 88 | 2016 | Chalk River Laboratories | Elemental Tritium (HT) | Bq | 2.550000e+12 | 0.0 |
| 119 | 2017 | Chalk River Laboratories | Elemental Tritium (HT) | Bq | 4.640000e+12 | 0.0 |
# No Direct Discharge reported (addressed in question (3)).
plt.figure(figsize=(16,6))
year = df_cnl_ht['Year'].unique()
for facility in df_cnl_ht['Facility Name'].unique():
plt.plot(year, df_cnl_ht[df_cnl_ht['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Elemental Tritium (HT) - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_ht['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(36) Why does Chalk River Laboratories has a peak in 2015 & a spike from 2017 to 2020 with a peak in 2018?
df_cnl_i125 = df_cnl[df_cnl['Substance Name (English)'] == 'Iodine-125']
df_cnl_i125.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 4 | 2013 | Chalk River Laboratories | Iodine-125 | Bq | 284000000.0 | 0.0 |
| 30 | 2014 | Chalk River Laboratories | Iodine-125 | Bq | 162000000.0 | 0.0 |
| 60 | 2015 | Chalk River Laboratories | Iodine-125 | Bq | 344000000.0 | 0.0 |
| 90 | 2016 | Chalk River Laboratories | Iodine-125 | Bq | 291000000.0 | 0.0 |
| 121 | 2017 | Chalk River Laboratories | Iodine-125 | Bq | 530000000.0 | 0.0 |
# No Direct Discharge reported (addressed in question (3)).
plt.figure(figsize=(16,6))
year = df_cnl_i125['Year'].unique()
for facility in df_cnl_i125['Facility Name'].unique():
plt.plot(year, df_cnl_i125[df_cnl_i125['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-125 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_i125['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(37) Why does Chalk River Laboratories has a peak in 2017? & plummets since 2018?
df_cnl_i131 = df_cnl[df_cnl['Substance Name (English)'] == 'Iodine-131']
df_cnl_i131.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 5 | 2013 | Chalk River Laboratories | Iodine-131 | Bq | 1.380000e+11 | 0.0 |
| 31 | 2014 | Chalk River Laboratories | Iodine-131 | Bq | 2.060000e+11 | 0.0 |
| 61 | 2015 | Chalk River Laboratories | Iodine-131 | Bq | 1.030000e+11 | 0.0 |
| 91 | 2016 | Chalk River Laboratories | Iodine-131 | Bq | 5.170000e+10 | 0.0 |
| 122 | 2017 | Chalk River Laboratories | Iodine-131 | Bq | 3.780000e+08 | 0.0 |
# No Direct Discharge reported (addressed in question (3)).
plt.figure(figsize=(16,6))
year = df_cnl_i131['Year'].unique()
for facility in df_cnl_i131['Facility Name'].unique():
plt.plot(year, df_cnl_i131[df_cnl_i131['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-131 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_i131['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(38) Why does Chalk River Laboratories has a peak in 2014? & comes down so much since 2017?
df_cnl_p238 = df_cnl[df_cnl['Substance Name (English)'] == 'Plutonium-238']
df_cnl_p238.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 146 | 2017 | Whiteshell Laboratories | Plutonium-238 | Bq | 0.0 | 8690000.0 |
| 181 | 2018 | Whiteshell Laboratories | Plutonium-238 | Bq | 0.0 | 18400000.0 |
| 215 | 2019 | Whiteshell Laboratories | Plutonium-238 | Bq | 0.0 | 48600000.0 |
| 249 | 2020 | Whiteshell Laboratories | Plutonium-238 | Bq | 0.0 | 23900000.0 |
# No Stack Emissions reported (addressed in question (2)).
plt.figure(figsize=(16,6))
year = df_cnl_p238['Year'].unique()
for facility in df_cnl_p238['Facility Name'].unique():
plt.plot(year, df_cnl_p238[df_cnl_p238['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Plutonium-238 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_p238['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(39) Why does Whiteshell Laboratories doesn't report values from 2013 to 2016 & on 2021?
(40) Why does Whiteshell Laboratories has a peak in 2019?
df_cnl_p239 = df_cnl[df_cnl['Substance Name (English)'] == 'Plutonium-239/240']
df_cnl_p239.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 147 | 2017 | Whiteshell Laboratories | Plutonium-239/240 | Bq | 0.0 | 12000000.0 |
| 182 | 2018 | Whiteshell Laboratories | Plutonium-239/240 | Bq | 0.0 | 23200000.0 |
| 216 | 2019 | Whiteshell Laboratories | Plutonium-239/240 | Bq | 0.0 | 47000000.0 |
| 250 | 2020 | Whiteshell Laboratories | Plutonium-239/240 | Bq | 0.0 | 39400000.0 |
# No Stack Emissions reported (addressed in question (2)).
plt.figure(figsize=(16,6))
year = df_cnl_p239['Year'].unique()
for facility in df_cnl_p239['Facility Name'].unique():
plt.plot(year, df_cnl_p239[df_cnl_p239['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Plutonium-239/240 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_p239['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(41) Why does Whiteshell Laboratories doesn't report values from 2013 to 2016 & on 2021?
(42) Why does Whiteshell Laboratories has a peak in 2019?
df_cnl_noble = df_cnl[df_cnl['Substance Name (English)'] == 'Total noble gases']
df_cnl_noble.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 9 | 2013 | Chalk River Laboratories | Total noble gases | Bq-MeV | 1.320000e+15 | 0.0 |
| 35 | 2014 | Chalk River Laboratories | Total noble gases | Bq-MeV | 2.110000e+15 | 0.0 |
| 65 | 2015 | Chalk River Laboratories | Total noble gases | Bq-MeV | 1.200000e+15 | 0.0 |
| 95 | 2016 | Chalk River Laboratories | Total noble gases | Bq-MeV | 3.970000e+14 | 0.0 |
| 126 | 2017 | Chalk River Laboratories | Total noble gases | Bq-MeV | 6.500000e+12 | 0.0 |
# No Direct Discharge reported (addressed in question (3)).
plt.figure(figsize=(16,6))
year = df_cnl_noble['Year'].unique()
for facility in df_cnl_noble['Facility Name'].unique():
plt.plot(year, df_cnl_noble[df_cnl_noble['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Total noble gases - Stack Emissions [Bq-MeV]', size=12)
plt.legend(df_cnl_noble['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(43) Why does Chalk River Laboratories has a peak in 2014? & comes down so much since 2017? & becomes zero since 2019?
df_cnl_xenon = df_cnl[df_cnl['Substance Name (English)'] == 'Xenon-133']
df_cnl_xenon.head()
| Year | Facility Name | Substance Name (English) | Units | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|---|
| 11 | 2013 | Chalk River Laboratories | Xenon-133 | Bq | 5.720000e+15 | 0.0 |
| 37 | 2014 | Chalk River Laboratories | Xenon-133 | Bq | 8.680000e+15 | 0.0 |
| 67 | 2015 | Chalk River Laboratories | Xenon-133 | Bq | 4.890000e+15 | 0.0 |
| 97 | 2016 | Chalk River Laboratories | Xenon-133 | Bq | 3.120000e+15 | 0.0 |
# No Direct Discharge reported (addressed in question (3)).
plt.figure(figsize=(16,6))
year = df_cnl_xenon['Year'].unique()
for facility in df_cnl_xenon['Facility Name'].unique():
plt.plot(year, df_cnl_xenon[df_cnl_xenon['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Xenon-133 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_xenon['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(44) Why does Chalk River Laboratories doesn't report values since 2017?
(45) Why does Chalk River Laboratories has a peak in 2014?
facilities = df_cnl['Facility Name'].unique()
for f in facilities:
df = df_cnl[df_cnl['Facility Name'] == f]
print(f,'\n')
subs = df['Substance Name (English)'].unique()
for s in subs:
df2 = df[df['Substance Name (English)'] == s]
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,8))
fig.subplots_adjust(hspace=0.5)
ax1.plot(df2['Year'], df2['Stack Emissions'], color='green')
ax1.set_title(s + ' - Stack Emissions', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax1.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax1.grid()
ax2.plot(df2['Year'], df2['Direct Discharge'], color='red')
ax2.set_title(s + ' - Direct Discharge', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax2.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
ax2.grid()
plt.show()
Chalk River Laboratories
Douglas Point
Nuclear Power Demonstration
Port Granby Project
Port Hope Project
Whiteshell Laboratories
df_cnl_2020 = pd.read_csv("./Datasets/2020/Canadian Nuclear Laboratories.csv", encoding='latin1')
df_cnl_2020.head()
| Year | Année | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Région économique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Français) | Units | Unités | Stack Emissions | Émissions de cheminées | Direct Discharge | Évacuations directes | Footnotes | Notes de bas de page | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Elemental Tritium (HT) | Tritium élémentaire | Bq | 5.06E+12 | NRM | NRS | NaN |
| 1 | 2020 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Tritium (HTO) | Tritium (Eau tritiée) | Bq | 2.54E+13 | 1.08E+13 | NaN |
| 2 | 2020 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Carbon-14 | Carbone-14 | Bq | 2.61E+10 | NRM | NRS | NaN |
| 3 | 2020 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Total noble gases | Total des gaz nobles | Bq-MeV | 0.00E+00 | NRM | NRS | NaN |
| 4 | 2020 | 3147.0 | Canadian Nuclear Laboratories / Laboratoires N... | Chalk River Laboratories | Chalk River | Deep River | Renfrew | Kingston--Pembroke | ON | 46.0554 | -77.3628 | Iodine-131 | Iode-131 | Bq | 2.44E+07 | NRM | NRS | NaN |
# I will leave only the essential columns, as I saw errors in the other ones (in NPRI ID for example) and those are not important.
df_cnl_2021 = df_cnl_0[['Year | Année', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Émissions de cheminées',
'Direct Discharge | Évacuations directes']]
df_cnl_2021.rename(columns={'Year | Année': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge'}, inplace = True)
df_cnl_2021.head()
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2021 | Chalk River Laboratories | Elemental Tritium (HT) | 2.08E+12 | NRM | NRS |
| 1 | 2021 | Chalk River Laboratories | Tritium (HTO) | 2.49E+13 | 1.50E+13 |
| 2 | 2021 | Chalk River Laboratories | Carbon-14 | 0.00E+00 | NRM | NRS |
| 3 | 2021 | Chalk River Laboratories | Total noble gases | 0.00E+00 | NRM | NRS |
| 4 | 2021 | Chalk River Laboratories | Iodine-125 | 1.76E+06 | NRM | NRS |
# I will remove 2021 from the new dataframe and compare the remaining with 2020.
df_cnl_2021 = df_cnl_2021[df_cnl_2021['Year'] != 2021].reset_index(drop = True)
df_cnl_2021.head()
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2020 | Chalk River Laboratories | Elemental Tritium (HT) | 5.06E+12 | NRM | NRS |
| 1 | 2020 | Chalk River Laboratories | Tritium (HTO) | 2.54E+13 | 1.08E+13 |
| 2 | 2020 | Chalk River Laboratories | Carbon-14 | 2.61E+10 | NRM | NRS |
| 3 | 2020 | Chalk River Laboratories | Total noble gases | 0.00E+00 | NRM | NRS |
| 4 | 2020 | Chalk River Laboratories | Iodine-125 | 2.00E+06 | NRM | NRS |
# I will leave only the essential columns, as I saw errors in the other ones (in NPRI ID for example) and those are not important.
df_cnl_2020 = df_cnl_2020[['Year | Année', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Émissions de cheminées',
'Direct Discharge | Évacuations directes']]
df_cnl_2020.rename(columns={'Year | Année': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge'}, inplace = True)
df_cnl_2020.head()
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2020 | Chalk River Laboratories | Elemental Tritium (HT) | 5.06E+12 | NRM | NRS |
| 1 | 2020 | Chalk River Laboratories | Tritium (HTO) | 2.54E+13 | 1.08E+13 |
| 2 | 2020 | Chalk River Laboratories | Carbon-14 | 2.61E+10 | NRM | NRS |
| 3 | 2020 | Chalk River Laboratories | Total noble gases | 0.00E+00 | NRM | NRS |
| 4 | 2020 | Chalk River Laboratories | Iodine-131 | 2.44E+07 | NRM | NRS |
# I will concatenate both dataframes & keep the not duplicates to see the differences. This produces a df with 2021's values first, and 2020's values after:
df = pd.concat([df_cnl_2021,df_cnl_2020]).drop_duplicates(keep=False)
df = df.sort_values(by=['Facility Name', 'Substance Name (English)', 'Year'])
df
# I will explore the differences:
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 74 | 2018 | Chalk River Laboratories | Argon-41 | 2.64E+15 | NRM | NRS |
| 69 | 2018 | Chalk River Laboratories | Argon-41 | 2.59E+15 | NRM | NRS |
| 143 | 2016 | Chalk River Laboratories | Carbon-14 | 4.85E+11 | NRM | NRS |
| 135 | 2016 | Chalk River Laboratories | Carbon-14 | 4.84E+11 | NRM | NRS |
| 107 | 2017 | Chalk River Laboratories | Carbon-14 | 4.91E+11 | NRM | NRS |
| 101 | 2017 | Chalk River Laboratories | Carbon-14 | 4.90E+11 | NRM | NRS |
| 70 | 2018 | Chalk River Laboratories | Carbon-14 | 2.59E+11 | NRM | NRS |
| 66 | 2018 | Chalk River Laboratories | Carbon-14 | 2.54E+11 | NRM | NRS |
| 44 | 2019 | Chalk River Laboratories | Estimated public dose (see footnote) | 0.0039 | NRM | NRS |
| 40 | 2019 | Chalk River Laboratories | Estimated public dose (see footnote) | 0.0038 | NRM | NRS |
| 10 | 2020 | Chalk River Laboratories | Estimated public dose (see footnote) | 0.0074 | NRM | NRS |
| 8 | 2020 | Chalk River Laboratories | Estimated public dose (see footnote) | 0.0059 | NRM | NRS |
| 236 | 2013 | Chalk River Laboratories | Iodine-125 | 2.84E+08 | NRM | NRS |
| 206 | 2014 | Chalk River Laboratories | Iodine-125 | 1.62E+08 | NRM | NRS |
| 176 | 2015 | Chalk River Laboratories | Iodine-125 | 3.44E+08 | NRM | NRS |
| 145 | 2016 | Chalk River Laboratories | Iodine-125 | 2.91E+08 | NRM | NRS |
| 109 | 2017 | Chalk River Laboratories | Iodine-125 | 5.30E+08 | NRM | NRS |
| 72 | 2018 | Chalk River Laboratories | Iodine-125 | 9.67E+07 | NRM | NRS |
| 38 | 2019 | Chalk River Laboratories | Iodine-125 | 2.44E+06 | NRM | NRS |
| 4 | 2020 | Chalk River Laboratories | Iodine-125 | 2.00E+06 | NRM | NRS |
| 110 | 2017 | Chalk River Laboratories | Iodine-131 | 3.78E+08 | NRM | NRS |
| 103 | 2017 | Chalk River Laboratories | Iodine-131 | 3.82E+08 | NRM | NRS |
| 73 | 2018 | Chalk River Laboratories | Iodine-131 | 1.05E+08 | NRM | NRS |
| 68 | 2018 | Chalk River Laboratories | Iodine-131 | 1.02E+08 | NRM | NRS |
| 242 | 2013 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 1.51E+10 |
| 212 | 2014 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 2.26E+11 |
| 182 | 2015 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 6.70E+10 |
| 151 | 2016 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 7.30E+09 |
| 114 | 2017 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 1.66E+10 |
| 77 | 2018 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 8.72E+09 |
| 43 | 2019 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 2.57E+09 |
| 9 | 2020 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 7.07E+09 |
| 144 | 2016 | Chalk River Laboratories | Total noble gases | 3.97E+14 | NRM | NRS |
| 136 | 2016 | Chalk River Laboratories | Total noble gases | 8.50E+14 | NRM | NRS |
| 142 | 2016 | Chalk River Laboratories | Tritium (HTO) | 2.45E+14 | 3.50E+13 |
| 134 | 2016 | Chalk River Laboratories | Tritium (HTO) | 2.30E+14 | 3.50E+13 |
| 106 | 2017 | Chalk River Laboratories | Tritium (HTO) | 2.53E+14 | 3.81E+13 |
| 100 | 2017 | Chalk River Laboratories | Tritium (HTO) | 2.50E+14 | 3.81E+13 |
| 69 | 2018 | Chalk River Laboratories | Tritium (HTO) | 2.34E+14 | 1.93E+13 |
| 65 | 2018 | Chalk River Laboratories | Tritium (HTO) | 2.29E+14 | 1.93E+13 |
| 35 | 2019 | Chalk River Laboratories | Tritium (HTO) | 2.01E+14 | 1.31E+13 |
| 33 | 2019 | Chalk River Laboratories | Tritium (HTO) | 2.01E+14 | 1.37E+13 |
| 20 | 2020 | Whiteshell Laboratories | Estimated public dose (see footnote) | 3.00E-06 | NRM | NRS |
| 18 | 2020 | Whiteshell Laboratories | Estimated public dose (see footnote) | 0.00E+00 | NRM | NRS |
df_cnl_2021[(df_cnl_2021['Facility Name'] == 'Chalk River Laboratories') & (df_cnl_2021['Substance Name (English)'] == 'Iodine-125')]
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 4 | 2020 | Chalk River Laboratories | Iodine-125 | 2.00E+06 | NRM | NRS |
| 38 | 2019 | Chalk River Laboratories | Iodine-125 | 2.44E+06 | NRM | NRS |
| 72 | 2018 | Chalk River Laboratories | Iodine-125 | 9.67E+07 | NRM | NRS |
| 109 | 2017 | Chalk River Laboratories | Iodine-125 | 5.30E+08 | NRM | NRS |
| 145 | 2016 | Chalk River Laboratories | Iodine-125 | 2.91E+08 | NRM | NRS |
| 176 | 2015 | Chalk River Laboratories | Iodine-125 | 3.44E+08 | NRM | NRS |
| 206 | 2014 | Chalk River Laboratories | Iodine-125 | 1.62E+08 | NRM | NRS |
| 236 | 2013 | Chalk River Laboratories | Iodine-125 | 2.84E+08 | NRM | NRS |
df_cnl_2020[(df_cnl_2020['Facility Name'] == 'Chalk River Laboratories') & (df_cnl_2020['Substance Name (English)'] == 'Iodine-125')]
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge |
|---|
(46) Where did the values for Chalk River Laboratories Iodine-125 come from? They were not there in 2020's dataframe.
df_cnl_2021[(df_cnl_2021['Facility Name'] == 'Chalk River Laboratories') & (df_cnl_2021['Substance Name (English)'] == 'Strontium-90')]
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 9 | 2020 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 7.07E+09 |
| 43 | 2019 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 2.57E+09 |
| 77 | 2018 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 8.72E+09 |
| 114 | 2017 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 1.66E+10 |
| 151 | 2016 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 7.30E+09 |
| 182 | 2015 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 6.70E+10 |
| 212 | 2014 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 2.26E+11 |
| 242 | 2013 | Chalk River Laboratories | Strontium-90 | NRM | NRS | 1.51E+10 |
df_cnl_2020[(df_cnl_2020['Facility Name'] == 'Chalk River Laboratories') & (df_cnl_2020['Substance Name (English)'] == 'Strontium-90')]
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge |
|---|
(47) Where did the values for Chalk River Laboratories Strontium-90 come from? They were not there in 2020's dataframe.
df[(df['Substance Name (English)'] != 'Iodine-125') & (df['Substance Name (English)'] != 'Strontium-90')]
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 74 | 2018 | Chalk River Laboratories | Argon-41 | 2.64E+15 | NRM | NRS |
| 69 | 2018 | Chalk River Laboratories | Argon-41 | 2.59E+15 | NRM | NRS |
| 143 | 2016 | Chalk River Laboratories | Carbon-14 | 4.85E+11 | NRM | NRS |
| 135 | 2016 | Chalk River Laboratories | Carbon-14 | 4.84E+11 | NRM | NRS |
| 107 | 2017 | Chalk River Laboratories | Carbon-14 | 4.91E+11 | NRM | NRS |
| 101 | 2017 | Chalk River Laboratories | Carbon-14 | 4.90E+11 | NRM | NRS |
| 70 | 2018 | Chalk River Laboratories | Carbon-14 | 2.59E+11 | NRM | NRS |
| 66 | 2018 | Chalk River Laboratories | Carbon-14 | 2.54E+11 | NRM | NRS |
| 44 | 2019 | Chalk River Laboratories | Estimated public dose (see footnote) | 0.0039 | NRM | NRS |
| 40 | 2019 | Chalk River Laboratories | Estimated public dose (see footnote) | 0.0038 | NRM | NRS |
| 10 | 2020 | Chalk River Laboratories | Estimated public dose (see footnote) | 0.0074 | NRM | NRS |
| 8 | 2020 | Chalk River Laboratories | Estimated public dose (see footnote) | 0.0059 | NRM | NRS |
| 110 | 2017 | Chalk River Laboratories | Iodine-131 | 3.78E+08 | NRM | NRS |
| 103 | 2017 | Chalk River Laboratories | Iodine-131 | 3.82E+08 | NRM | NRS |
| 73 | 2018 | Chalk River Laboratories | Iodine-131 | 1.05E+08 | NRM | NRS |
| 68 | 2018 | Chalk River Laboratories | Iodine-131 | 1.02E+08 | NRM | NRS |
| 144 | 2016 | Chalk River Laboratories | Total noble gases | 3.97E+14 | NRM | NRS |
| 136 | 2016 | Chalk River Laboratories | Total noble gases | 8.50E+14 | NRM | NRS |
| 142 | 2016 | Chalk River Laboratories | Tritium (HTO) | 2.45E+14 | 3.50E+13 |
| 134 | 2016 | Chalk River Laboratories | Tritium (HTO) | 2.30E+14 | 3.50E+13 |
| 106 | 2017 | Chalk River Laboratories | Tritium (HTO) | 2.53E+14 | 3.81E+13 |
| 100 | 2017 | Chalk River Laboratories | Tritium (HTO) | 2.50E+14 | 3.81E+13 |
| 69 | 2018 | Chalk River Laboratories | Tritium (HTO) | 2.34E+14 | 1.93E+13 |
| 65 | 2018 | Chalk River Laboratories | Tritium (HTO) | 2.29E+14 | 1.93E+13 |
| 35 | 2019 | Chalk River Laboratories | Tritium (HTO) | 2.01E+14 | 1.31E+13 |
| 33 | 2019 | Chalk River Laboratories | Tritium (HTO) | 2.01E+14 | 1.37E+13 |
| 20 | 2020 | Whiteshell Laboratories | Estimated public dose (see footnote) | 3.00E-06 | NRM | NRS |
| 18 | 2020 | Whiteshell Laboratories | Estimated public dose (see footnote) | 0.00E+00 | NRM | NRS |
(48) Why did this 14 set of values changed between reports? Why wasn't it addressed somewhere?
df_umm = pd.read_csv("./Datasets/Uranium Mines and Mills.csv")
df_umm
| Year | Année | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Région économique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Français) | Units | Unités | Stack Emissions | Émissions de cheminées | Direct Discharge | Évacuations directes | Footnotes | Notes de bas de page | Unnamed: 17 | Unnamed: 18 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Uranium | Uranium | kg | NRM | NRS | 68.9 | NaN | NaN | NaN |
| 1 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Thorium-230 | Thorium-230 | MBq | NRM | NRS | 57.5 | NaN | NaN | NaN |
| 2 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Radium-226 | Radium-226 | MBq | NRM | NRS | 22.6 | NaN | NaN | NaN |
| 3 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Lead-210 | Plomb-210 | MBq | NRM | NRS | 76.6 | NaN | NaN | NaN |
| 4 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Polonium-210 | Polonium-210 | MBq | NRM | NRS | 43.1 | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 220 | 2013 | 4866 | Orano | McClean Lake | Saskatoon | NaN | NaN | NaN | SK | 58.2611 | -103.8025 | Uranium | Uranium | kg | NRM | NRS | 1.8 | NaN | NaN | NaN |
| 221 | 2013 | 4866 | Orano | McClean Lake | Saskatoon | NaN | NaN | NaN | SK | 58.2611 | -103.8025 | Thorium-230 | Thorium-230 | MBq | NRM | NRS | 19.6 | NaN | NaN | NaN |
| 222 | 2013 | 4866 | Orano | McClean Lake | Saskatoon | NaN | NaN | NaN | SK | 58.2611 | -103.8025 | Radium-226 | Radium-226 | MBq | NRM | NRS | 6.0 | NaN | NaN | NaN |
| 223 | 2013 | 4866 | Orano | McClean Lake | Saskatoon | NaN | NaN | NaN | SK | 58.2611 | -103.8025 | Lead-210 | Plomb-210 | MBq | NRM | NRS | 74.4 | NaN | NaN | NaN |
| 224 | 2013 | 4866 | Orano | McClean Lake | Saskatoon | NaN | NaN | NaN | SK | 58.2611 | -103.8025 | Polonium-210 | Polonium-210 | MBq | NRM | NRS | 17.7 | NaN | NaN | NaN |
225 rows × 19 columns
(1) Why does the Data start in 2013? Can we get older data?
# I'm creating a copy because I will need it later.
df_umm_0 = df_umm.copy()
df_umm["Facility Name | Nom de l'installation"].unique()
array(['Rabbit Lake', 'Key Lake', 'McArthur River', 'Cigar Lake',
'McClean Lake'], dtype=object)
# All Stack Emissions are NRM (Not Required to Monitor). So I will focus this analysis on Direct Discharge only.
df_umm['Stack Emissions | Émissions de cheminées'].unique()
array(['NRM | NRS'], dtype=object)
(2) Why don't they monitor & report Stack Emissions? Can you confirm that there isn't radionuclides stack emissions at all?
# Renaming columns to English only:
df_umm.rename(columns={'Year | Année': 'Year', 'NPRI ID | ID INRP': 'NPRI ID','Company Name | Raison Sociale':'Company Name',"Facility Name | Nom de l'installation":'Facility Name', 'City | Ville':'City', 'CSD | SDR':'CSD','CA or CMA | AR ou RMR':'CA or CMA', 'Economic Region | Région économique':'Economic Region','Province | Province':'Province', 'Latitude | Latitude':'Latitude', 'Longitude | Longitude':'Longitude','Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)','Units | Unités':'Units', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge','Footnotes | Notes de bas de page':'Footnotes'}, inplace = True)
df_umm.head()
| Year | NPRI ID | Company Name | Facility Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | Substance Name (English) | Substance Name (French) | Nom de substance (Français) | Units | Stack Emissions | Direct Discharge | Footnotes | Unnamed: 17 | Unnamed: 18 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Uranium | Uranium | kg | NRM | NRS | 68.9 | NaN | NaN | NaN |
| 1 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Thorium-230 | Thorium-230 | MBq | NRM | NRS | 57.5 | NaN | NaN | NaN |
| 2 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Radium-226 | Radium-226 | MBq | NRM | NRS | 22.6 | NaN | NaN | NaN |
| 3 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Lead-210 | Plomb-210 | MBq | NRM | NRS | 76.6 | NaN | NaN | NaN |
| 4 | 2021 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Polonium-210 | Polonium-210 | MBq | NRM | NRS | 43.1 | NaN | NaN | NaN |
# After checking Direct Discharge, I noticed there is a value "DL" & a 0 value:
df_umm[(df_umm['Direct Discharge'] == 'DL') | (df_umm['Direct Discharge'] == '0.0')]
| Year | NPRI ID | Company Name | Facility Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | Substance Name (English) | Substance Name (French) | Nom de substance (Français) | Units | Stack Emissions | Direct Discharge | Footnotes | Unnamed: 17 | Unnamed: 18 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15 | 2021 | 19397 | Cameco | Cigar Lake | Saskatoon | NaN | NaN | NaN | SK | 58.0686 | -104.5406 | Uranium | Uranium | kg | NRM | NRS | 0.0 | NaN | NaN | NaN |
| 203 | 2013 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Lead-210 | Plomb-210 | MBq | NRM | NRS | DL | Loadings were less than the detection limit | ... | NaN | NaN |
(3) Summary of Special Values in Direct Discharge:
# Cleaning the geography dataframe:
df_umm_geography = df_umm[['Facility Name', 'NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude']]
df_umm_geography.drop_duplicates(inplace=True)
df_umm_geography = df_umm_geography.reset_index(drop=True)
df_umm_geography
| Facility Name | NPRI ID | Company Name | City | CSD | CA or CMA | Economic Region | Province | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Rabbit Lake | 1147 | Cameco | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 |
| 1 | Key Lake | 1148 | Cameco | Saskatoon | NaN | NaN | NaN | SK | 57.2067 | -105.6592 |
| 2 | McArthur River | 1149 | Cameco | Saskatoon | NaN | NaN | NaN | SK | 57.7625 | -105.0519 |
| 3 | Cigar Lake | 19397 | Cameco | Saskatoon | NaN | NaN | NaN | SK | 58.0686 | -104.5406 |
| 4 | McClean Lake | 4866 | Orano | Saskatoon | NaN | NaN | NaN | SK | 58.2611 | -103.8025 |
# Cleaning the DL value so I can convert the column into numeric for later plotting:
df_umm['Direct Discharge'].replace('DL', 0, inplace=True)
# Converted columns to numeric for plotting:
df_umm['Direct Discharge'] = pd.to_numeric(df_umm['Direct Discharge'])
df_umm.drop(columns=['NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude', 'Substance Name (French) | Nom de substance (Français)', 'Stack Emissions', 'Footnotes', 'Unnamed: 17', 'Unnamed: 18'], inplace=True)
df_umm.head()
| Year | Facility Name | Substance Name (English) | Units | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2021 | Rabbit Lake | Uranium | kg | 68.9 |
| 1 | 2021 | Rabbit Lake | Thorium-230 | MBq | 57.5 |
| 2 | 2021 | Rabbit Lake | Radium-226 | MBq | 22.6 |
| 3 | 2021 | Rabbit Lake | Lead-210 | MBq | 76.6 |
| 4 | 2021 | Rabbit Lake | Polonium-210 | MBq | 43.1 |
# First, I'm saving the clean dataframe to do a dashboard in Tableau.
df_umm.to_csv(".\Datasets\df_umm.csv", index=True, header=True)
df_umm_uranium = df_umm[df_umm['Substance Name (English)'] == 'Uranium']
df_umm_uranium.head()
| Year | Facility Name | Substance Name (English) | Units | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2021 | Rabbit Lake | Uranium | kg | 68.9 |
| 5 | 2021 | Key Lake | Uranium | kg | 49.1 |
| 10 | 2021 | McArthur River | Uranium | kg | 18.5 |
| 15 | 2021 | Cigar Lake | Uranium | kg | 0.0 |
| 20 | 2021 | McClean Lake | Uranium | kg | 10.2 |
plt.figure(figsize=(16,6))
year = df_umm_uranium['Year'].unique()
for facility in df_umm_uranium['Facility Name'].unique():
plt.plot(year, df_umm_uranium[df_umm_uranium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Direct Discharge [kg]', size=12)
plt.legend(df_umm_uranium['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(4) Why does Rabbit Lake produces so much more than the rest & why the peaks in 2013 & 2016?
# plotting without Rabbit Lake to have a better look at the rest.
df_umm_uranium_2 = df_umm_uranium[df_umm_uranium['Facility Name'] != 'Rabbit Lake']
plt.figure(figsize=(16,6))
year = df_umm_uranium_2['Year'].unique()
for facility in df_umm_uranium_2['Facility Name'].unique():
plt.plot(year, df_umm_uranium_2[df_umm_uranium_2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Direct Discharge [kg]', size=12)
plt.legend(df_umm_uranium_2['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(5) Why does Key Lake have a spike from 2018 to 2021, with peaks in 2019 & 2021?
(6) Why does Cigar Lake have a peak in 2015? (from 6.6 to 38 & then back to 2.4)
df_umm_thorium = df_umm[df_umm['Substance Name (English)'] == 'Thorium-230']
df_umm_thorium.head()
| Year | Facility Name | Substance Name (English) | Units | Direct Discharge | |
|---|---|---|---|---|---|
| 1 | 2021 | Rabbit Lake | Thorium-230 | MBq | 57.5 |
| 6 | 2021 | Key Lake | Thorium-230 | MBq | 23.7 |
| 11 | 2021 | McArthur River | Thorium-230 | MBq | 26.2 |
| 16 | 2021 | Cigar Lake | Thorium-230 | MBq | 3.5 |
| 21 | 2021 | McClean Lake | Thorium-230 | MBq | 18.1 |
plt.figure(figsize=(16,6))
year = df_umm_thorium['Year'].unique()
for facility in df_umm_thorium['Facility Name'].unique():
plt.plot(year, df_umm_thorium[df_umm_thorium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Thorium-230 - Direct Discharge [MBq]', size=12)
plt.legend(df_umm_thorium['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(7) Why does Rabbit Lake have a peak in 2017?
(8) Why does Key Lake have a spike from 2015 to 2017 & a peak in 2020?
df_umm_radium = df_umm[df_umm['Substance Name (English)'] == 'Radium-226']
df_umm_radium.head()
| Year | Facility Name | Substance Name (English) | Units | Direct Discharge | |
|---|---|---|---|---|---|
| 2 | 2021 | Rabbit Lake | Radium-226 | MBq | 22.6 |
| 7 | 2021 | Key Lake | Radium-226 | MBq | 42.1 |
| 12 | 2021 | McArthur River | Radium-226 | MBq | 106.7 |
| 17 | 2021 | Cigar Lake | Radium-226 | MBq | 2.3 |
| 22 | 2021 | McClean Lake | Radium-226 | MBq | 17.8 |
plt.figure(figsize=(16,6))
year = df_umm_radium['Year'].unique()
for facility in df_umm_radium['Facility Name'].unique():
plt.plot(year, df_umm_radium[df_umm_radium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Radium-226 - Direct Discharge [MBq]', size=12)
plt.legend(df_umm_radium['Facility Name'].unique(), loc='upper right')
plt.grid()
plt.show()
(9) Why does McArthur River have significantly more discharge than the rest?
(10) Why does Key Lake have a peak in 2019?
df_umm_lead = df_umm[df_umm['Substance Name (English)'] == 'Lead-210']
df_umm_lead.head()
| Year | Facility Name | Substance Name (English) | Units | Direct Discharge | |
|---|---|---|---|---|---|
| 3 | 2021 | Rabbit Lake | Lead-210 | MBq | 76.6 |
| 8 | 2021 | Key Lake | Lead-210 | MBq | 27.2 |
| 13 | 2021 | McArthur River | Lead-210 | MBq | 56.1 |
| 18 | 2021 | Cigar Lake | Lead-210 | MBq | 7.7 |
| 23 | 2021 | McClean Lake | Lead-210 | MBq | 34.3 |
plt.figure(figsize=(16,6))
year = df_umm_lead['Year'].unique()
for facility in df_umm_lead['Facility Name'].unique():
plt.plot(year, df_umm_lead[df_umm_lead['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Lead-210 - Direct Discharge [MBq]', size=12)
plt.legend(df_umm_lead['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(11) Why does Rabbit Lake releases significantly more than the rest & have a spike from 2015 to 2019, with peaks in 2015, 2016 & 2018?
df_umm_polonium = df_umm[df_umm['Substance Name (English)'] == 'Polonium-210']
df_umm_polonium.head()
| Year | Facility Name | Substance Name (English) | Units | Direct Discharge | |
|---|---|---|---|---|---|
| 4 | 2021 | Rabbit Lake | Polonium-210 | MBq | 43.1 |
| 9 | 2021 | Key Lake | Polonium-210 | MBq | 35.5 |
| 14 | 2021 | McArthur River | Polonium-210 | MBq | 14.7 |
| 19 | 2021 | Cigar Lake | Polonium-210 | MBq | 3.5 |
| 24 | 2021 | McClean Lake | Polonium-210 | MBq | 20.2 |
plt.figure(figsize=(16,6))
year = df_umm_polonium['Year'].unique()
for facility in df_umm_polonium['Facility Name'].unique():
plt.plot(year, df_umm_polonium[df_umm_polonium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Polonium-210 - Direct Discharge [MBq]', size=12)
plt.legend(df_umm_polonium['Facility Name'].unique(), loc='upper left')
plt.grid()
plt.show()
(12) Why does McArthur River have a peak in 2015?
(13) Why does Rabbit Lake have a peak in 2013?
(14) Why does Key Lake have a peak in 2014?
(15) Why does McClean Lake have a peak in 2016?
facilities = df_umm['Facility Name'].unique()
for f in facilities:
df = df_umm[df_umm['Facility Name'] == f]
print(f,'\n')
subs = df['Substance Name (English)'].unique()
for s in subs:
df2 = df[df['Substance Name (English)'] == s]
plt.figure(figsize=(12,4))
plt.plot(df2['Year'], df2['Direct Discharge'], color='red')
plt.title(s + ' - Direct Discharge', fontdict={'fontsize': 14, 'fontweight': 'medium'})
plt.xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
plt.grid()
plt.show()
Rabbit Lake
Key Lake
McArthur River
Cigar Lake
McClean Lake
df_umm_2020 = pd.read_csv("./Datasets/2020/Uranium Mines and Mills.csv")
df_umm_2020.head()
| _id | Year | Annee | NPRI ID | ID INRP | Company Name | Raison Sociale | Facility Name | Nom de l'installation | City | Ville | CSD | SDR | CA or CMA | AR ou RMR | Economic Region | Region economique | Province | Province | Latitude | Latitude | Longitude | Longitude | Substance Name (English) | Nom de substance (Anglais) | Substance Name (French) | Nom de substance (Francais) | Units | Unites | Stack Emissions | Emissions de cheminees | Direct Discharge | Evacuations directes | Footnotes | Notes de bas de page | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2020 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Uranium | Uranium | kg | NRM | NRS | 80.3 | NaN |
| 1 | 2 | 2020 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Thorium-230 | Thorium-230 | MBq | NRM | NRS | 75.6 | NaN |
| 2 | 3 | 2020 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Radium-226 | Radium-226 | MBq | NRM | NRS | 24.0 | NaN |
| 3 | 4 | 2020 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Lead-210 | Plomb-210 | MBq | NRM | NRS | 75.6 | NaN |
| 4 | 5 | 2020 | 1147 | Cameco | Rabbit Lake | Saskatoon | NaN | NaN | NaN | SK | 58.1978 | -103.7136 | Polonium-210 | Polonium-210 | MBq | NRM | NRS | 32.1 | NaN |
# I will leave only the essential columns.
df_umm_2021 = df_umm_0[['Year | Année', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Émissions de cheminées',
'Direct Discharge | Évacuations directes']]
df_umm_2021.rename(columns={'Year | Année': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge'}, inplace = True)
df_umm_2021.head()
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2021 | Rabbit Lake | Uranium | NRM | NRS | 68.9 |
| 1 | 2021 | Rabbit Lake | Thorium-230 | NRM | NRS | 57.5 |
| 2 | 2021 | Rabbit Lake | Radium-226 | NRM | NRS | 22.6 |
| 3 | 2021 | Rabbit Lake | Lead-210 | NRM | NRS | 76.6 |
| 4 | 2021 | Rabbit Lake | Polonium-210 | NRM | NRS | 43.1 |
# I will remove 2021 from the new dataframe and compare the remaining with 2020.
df_umm_2021 = df_umm_2021[df_umm_2021['Year'] != 2021].reset_index(drop = True)
df_umm_2021.head()
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2020 | Rabbit Lake | Uranium | NRM | NRS | 80.3 |
| 1 | 2020 | Rabbit Lake | Thorium-230 | NRM | NRS | 75.6 |
| 2 | 2020 | Rabbit Lake | Radium-226 | NRM | NRS | 24.0 |
| 3 | 2020 | Rabbit Lake | Lead-210 | NRM | NRS | 75.6 |
| 4 | 2020 | Rabbit Lake | Polonium-210 | NRM | NRS | 32.1 |
# I will leave only the essential columns for 2020 too.
df_umm_2020 = df_umm_2020[['Year | Annee', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Emissions de cheminees',
'Direct Discharge | Evacuations directes']]
df_umm_2020.rename(columns={'Year | Annee': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Emissions de cheminees':'Stack Emissions','Direct Discharge | Evacuations directes':'Direct Discharge'}, inplace = True)
df_umm_2020.head()
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge | |
|---|---|---|---|---|---|
| 0 | 2020 | Rabbit Lake | Uranium | NRM | NRS | 80.3 |
| 1 | 2020 | Rabbit Lake | Thorium-230 | NRM | NRS | 75.6 |
| 2 | 2020 | Rabbit Lake | Radium-226 | NRM | NRS | 24.0 |
| 3 | 2020 | Rabbit Lake | Lead-210 | NRM | NRS | 75.6 |
| 4 | 2020 | Rabbit Lake | Polonium-210 | NRM | NRS | 32.1 |
# I will concatenate both dataframes & keep the not duplicates to see the differences. This produces a df with 2021's values first, and 2020's values after:
df = pd.concat([df_umm_2021,df_umm_2020]).drop_duplicates(keep=False)
df = df.sort_values(by=['Facility Name', 'Substance Name (English)', 'Year'])
df
# No errors at all.
| Year | Facility Name | Substance Name (English) | Stack Emissions | Direct Discharge |
|---|